← Back to articles

No-blame cultures

Picture the scene: something broke in production. If you’re doing it right, it was detected by your monitoring tools. If you’re doing it wrong, your users helpfully informed you: by email, on the phone, through Reddit. You desperately scramble around for a solution. You roll back a release, modify some configuration, reboot a machine. Something worked; production is back up. The phones stop ringing. You take a deep breath.

Now the panic has subsided, you try and identify the problem. You check some logs, see what deployments have happened recently, talk it over with the team. After a while, you have a solid understanding of exactly what went wrong and what could have been done to prevent it.

What you do next defines the culture of your organisation.

All problems are obvious with hindsight. Maybe the developer left a bug in the code, maybe QA didn’t think of that test scenario, maybe someone misconfigured the server. There will be at least one person, and possibly several, who could have done something better and that would have avoided the problem. So its their fault, right?

Wrong. Humans make mistakes. Always have, always will. If your quality strategy relies on humans not making mistakes, you don’t have a quality strategy.

When things go wrong: fix them, then learn from them. How can you change your tools or processes in a way that prevents this from happening in the future? If you replace the human, and leave the system alone, then you’re just begging for the same problem to happen again. Your new human is no more infallible than the last one; the true fault lies elsewhere.

The best organisations have mastered this. Go and read the post-mortem of any major AWS outage. You’ll find a detailed description of the changes they have made, or will make, to avoid a repeat.

Every production issue you encounter should be the last time that specific issue happens. Achieve that consistently, let the clock run a while, and you end up with an organisation that sees very few issues.