Many years ago, on the day we decided to release a major upgrade to our product, a decision that of course required my approval as Director of QA, I was walking down the hallway and the CEO passed me.
He stopped me and said, “So I can assume we are shipping with no defects?”
Without thinking as much as I probably should have, I blithely answered, “Of course not. There’s no such thing as software without defects.”
He was stunned and appalled by my answer. He was so upset with me, he terminated our conversation and stormed away down the hallway.
At the time I chalked this up to the fact he was new to the software business and life cycle, having come from the publishing industry (long story—another time). Because everyone on the development team knew we were shipping with defects. But we knew what those defects were to begin with, and that made all the difference in our risk assessment.
I’m noticing though, with alarm, that recently this misapprehension on the part of my former CEO has become widespread in our industry. Bug-free, “zero defect software” is a new boast/mantra I’m seeing and hearing more and more. And not just from clueless CEOs (redundant, I know), but from people in the industry who should know better. And who in fact do. Which I find very troubling.
There is a sense in which "zero defect" thinking is not just mendacious, self-serving nonsense. In the west it is mostly just hot air and posturing for effect. But in Japan it is actually meaningful. And in all of its industries.
I worked in Japan for a year, and this blew my mind, but in a very good way. Because their definition of a "defect" is very different from ours. For example, if there's a gum wrapper left on the floor of the factory, that is considered a defect, as much as a problem with the machinery or the output of a process is. If we were really serious about "zero defects", we'd think about that issue the way the Japanese do. But we don't, so we won't.
What I was trying to say to that hapless CEO, perhaps too bluntly, is that any piece of modern commercial software is infinitely complex, and the ways in which it can interact with itself or its environment is therefore also infinite. It's only possible to find every defect if you have an infinite amount of time to test. But none of us is immortal.
Even if you could find every last defect the first time around, you wouldn't have time to fix them all and ship a product this century—time-to-market is a real thing. So even if found, you will have to ship with them.
So the whole notion of “zero defect software” is inherently delusional to begin with.
Where Did The Notion Of Zero-Defect Software Come From?
The root of this delusion is that QA's role is to "ensure" quality. It does no such thing. That is the task of Product Management and Engineering. The role of quality assurance is to *assess* how successful they've been at that.
But there is another pernicious layer to this problem.
Recently I was discussing this very issue with a software developer. He pointed out to me that, the way the meaning of “zero defect software” had been explained to him was what I just said above. It means shipping with all the bugs fixed that the team decided it needed to fix in order to ship. And then shipping with all the defects they decided not to fix. A definition my respondent did not agree with, by the way.
Now this is clearly just definitional smoke and mirrors. You really are shipping with defects, and you know you are, but you are using the slogan of "zero defects" to hide that fact from yourselves—and management, of course. Who we all know are easily hypnotized by empty slogans that make them feel successful.
It’s selective defect reduction, nothing more.
How Do We Get Away From The Zero-Defect Ruse?
And the foundation of this ruse is language itself, language that shapes and obligates our thinking before we can even begin responding conceptually about what it is saying. It is this false language that creates the situation in the first place of having to explain why it is wrong. And so we begin being backed into a corner by the vocabulary of a slogan.
This is such a common practice in software development. It is unfortunately not restricted to lying to ourselves about "zero defects". It infects every level of our thinking and doing, as testers, in our development process, in our methodologies, and in our metrics.
Look, let's start facing reality about what we are doing and not doing. And the first step in that recovery is using language that honestly describes what we are doing and not doing. Because if the very language we use to communicate those realities is itself fundamentally fraudulent, dishonest and delusional, we are not making high quality software. We are manufacturing propaganda.
The other problem with the discourse of “zero defect software” is that even if this is a sincerely held goal—in the sense that people sincerely believe it is possible, and is not just an exercise in cynicism—another, more serious, problem constrains the usefulness of this idea. That problem is this:
How could you possibly know, and prove, you’ve found “all” the defects there are to be found in the software under test? And not just the number of defects you’ve managed to find in the time allotted? How could you prove to yourselves that the set of bugs you’ve found is the complete set of bugs the software actually contains?
You can’t. Because that would require some process external to QA itself to determine this. Otherwise it’s just a circular argument. And what would that external process be?
And this is true no matter how many bugs you’ve found already. You could have found a million of them. That doesn’t prove there isn’t a bug one million and one lurking out there in the software shadows.
Is There Anything Valuable In The Zero-Defect Concept?
The glaring logical and epistemological problems with talking and thinking in terms of “zero defect software” are simply insurmountable. The epithet itself is unsalvageable, no matter how we rephrase it.
But if we look beyond the deceptive slogan, into what makes it attractive and conceptually valid to otherwise thinking people in the first place, we will find something of value to focus on.
That “something” is a different question, but one that answers the practical and emotional needs seemingly addressed by the false discourse of zero defect software, and in a more coherent and defensible way.
The issue that people in software are really trying to come to grips with here by recourse to self-deception is itself deceptively simple. Namely, “When do we know we can stop testing?”
It is the only question I ever ask of candidates for QA leadership positions in my organization. Because, really, that is the only question QA needs to answer. And if it can’t, QA is a complete waste of time. To answer that question, we have to first answer a prior question.
Zero Defect vs % Test Coverage
How do we know if the defects we have already found—however many they may be—can be plausibly accepted as representing the risk envelope created by how the software under test was specified and engineered? Such that we can say with confidence that remaining undetected defects represent an acceptable risk to a decision to release the software to the public?
Posed in this way we see that this question is not a question about defects at all. It is a question about test coverage. Your analysis of bugs found, however vast their number, means nothing if they can’t be correlated with and against an analytical understanding of test coverage achieved to date by the QA effort, against coverage yet to be achieved.
Because if that set of known defects, however much time has been spent on testing, represents only 30% actual test coverage—and you don’t know this—then you have no rational basis for knowing what they mean in terms of a decision to ship the software.
The real discussion around, and the real goal of, truly effective QA efforts cannot be about “zero defects”, but rather about “100% test coverage”. For the reason I give above.
The great advantage of this way of thinking about software quality is that it relies on a definition of adequate test coverage carried out by QA itself prior to testing. Therefore that definition itself can be tested against specifications, requirements, past customer issues and customer needs, etc., and modified as necessary.
Moving Forward: Zero Gap Test Coverage
Thinking about this problem in terms of defects fails precisely because it is not something that can be meaningfully defined beforehand, since the number and severity of issues can only be discovered after testing.
Beyond that, it is not judged against a pre-existing (meaningful) definition of what the threshold of “zero” could actually mean. It is a waste of time to define success in terms of a variable that is itself undefined, and undefinable.
Which is why the only way to salvage the idea of zero defect software is to channel that thinking and planning towards zero-gap test coverage. The motivation behind the zero-defect mantra is not wrong in itself. It’s just been misdirected from its true target. Make that adjustment, and meaningful, measurable definitions of acceptable software quality become possible.
If you're ready to learn more and to learn from experts and CEOs nonetheless, here's a podcast we think you'll like: TEAMWORK, AI, AND CONTAINERIZATION (WITH NASA’S MICHAEL RITCHSON)