Effective Root Cause Analysis Means Accepting We Could Be Part of the Problem
Abstract: No matter where we work, we will experience failures or ‘undesirable outcomes’ of some kind. As long as we work with other humans, this will indeed be the case. These failures may surface in the form of production delays, injuries, customer complaints, missed deadlines, lost profits, legal claims and the like.
In order to prevent recurrence of any such undesirable outcome, we have to truly understand the causes that led up to that bad outcome. In many of our worlds, the process used to analyze and understand what went wrong is called Root Cause Analysis or RCA. However, for the sake of this article, call this process whatever you want; problem solving, brainstorming, troubleshooting, etc. The common denominator of these terms, is they desire to resolve a failure and ensure it does not happen again.
Let’s get away from labels and specific industries and focus on the anatomy of a ‘failure’. Where does a failure come from? Think about this no matter where you work and see if it applies.
The ‘Root’ System
The seeds of this germination are what we will call our management (or organizational) systems. These are the rules and guidelines in which our organization’s operate. Much like the laws of our lands that govern how our countries operate. Since these are created and maintained by humans, they are not flawless. They can be insufficient, inadequate, wrong and even non-existent (for situations unforeseen). We refer to these management system flaws as Latent Root Causes. This is because they are always there, lying dormant, and waiting to be activated by the human.
When our management systems are flawed in some fashion, we feed incomplete information to people who must process the information to make their decisions. Ultimately, this will likely result in a bad decision! We refer to these ‘decision errors’ as Human Root Causes.
When humans make a wrong decision, it is expressed in one of two ways, 1) errors of commission or 2) errors of omission. This means we took an action that was inappropriate (error of commission) or we should have taken an appropriate action and didn’t (error of omission).
Examples are endless to describe these two situations but an error of commission may be that we closed a valve in a manufacturing operation that we should have left open. An error of omission may be that an ER nurse improperly triaged a patient and as a result, they died waiting for care in the waiting room.
When humans make decision errors, they often result in observable consequences. At this point, the error chain has not been obvious because it is still in the mind of the decision-maker. Only after the decision is made, are the consequences observable. We will refer to these consequences as Physical Root Causes.
Let’s follow through with our examples used earlier. A manufacturing plant operator turns off a valve that cuts off water flow that would have cooled an overheated process. As a result, the overheated process causes an unexpected interruption that automatically shuts down the entire operation.
In the emergency room of the hospital, the improperly triaged patient flat lines and a Code Blue is called, forcing a rapid response team to tend to the patient. The patient had an underlying condition that was not detected during the initial triage assessment, and as a result the patient had a stroke and passed away.
In both of these scenarios, after the decisions were made, the consequences of the decisions became apparent.
RCA Effectiveness – Facing reality that we could be part of the problem
Now that we understand where a failure comes from and how the error chain grows, how can we make our RCA processes more effective? Why do we often seem to be doing RCA on the same events, over and over again? Are we not learning from the past? Is it that our RCA’s just aren’t that good?
Having been an RCA practitioner now for over 30 years working in various industry sectors, my observation is that we have a difficult time looking in the mirror and accepting that we could be part of the problem!
Many organizations seem content with their RCA processes, when their analyses pass some kind of regulatory audit. This means the regulators are off their backs.
However, that is not the true measure of RCA effectiveness and it is misleading. RCA effectiveness should be measured based on quantifiable and meaningful bottom-line metrics that correlate to corporate dashboards or KPIs.
In our hospital scenario, just because we passed an RCA audit or survey, is the patient any safer? Almost all 6,000 hospitals in the U.S. are accredited, yet the deaths due to medical error continue to rise (to the point that medical error is the 3rd leading killer of Americans today at over 1,000 deaths/day – http://www.healthcareitnews.com/news/deaths-by-medical-mistakes-hit-records).
The key to RCA effectiveness is facing the truth, and unfortunately, we are not very good at accepting the truth when it involves ourselves.
The ‘truth’ is embedded in the management systems we spoke about earlier. Oftentimes we focus on the decision-makers and then levy discipline for making a poor decision. However, RCA is not about ‘who’ made the poor decision. We are more interested in why the person felt his or her decision was appropriate at the time. In my opinion, this is what RCA is all about!
When we get into decision-makers heads and understand their reasoning for their decisions, most of the time their rationales are perfectly logical. Their decisions are most often well-intended. And more importantly, others
, would likely make the same decision given the same information.
When we delve this deep, this will bring us right back to the flawed management systems that provide these people such information. These systems are supposed to be in place to help our people make better decisions. So when they are flawed, our systems are at risk of not performing as intended.
Let’s reflect back on our hospital scenario described earlier. A patient comes to their local ER and is assessed by the nurse, PA or MD. Those conducting the triage certainly did not intend for the patient to be harmed while waiting for care. So what could have led them to believe that this particular patient could wait, relative to the acuity of the other patients in the ER? Here are just a couple of possibilities:
1. Inexperienced person conducting triage.
2. ER overloaded and staff was time-pressured and understaffed.
From a management system standpoint, if the above existed, we would have to drill deeper and understand the systems that permitted those conditions to exist.
1. Why would we have an inexperienced person conducting triage in the ER?
a. Person scheduled to do triage was unavailable due to another emergency that pulled them away (either at the hospital or a family emergency) so they pulled someone from another department that was available.
b. Person was a new hire and new to the position.
2. Why would we be understaffed when the ER was overloaded?
a. We did not anticipate the overload.
b. We did not have a plan in place to activate under such conditions.
c. We had a plan in place to handle the overload but we did not follow it.
d. We had a plan in place to handle the overload and followed it, but it was obsolete. It had not been updated since the addition of new technologies and the expansion of the ER.
Certainly this is not a comprehensive listing but it makes the point. This is where the mirror comes into play.
What if we were the person who:
1. Allowed the inexperienced triage person to work in that capacity, because things were hectic and confusing at the time?
2. Did not update the procedure for handling an overload condition, when the ER was updated and expanded?
3. Did not follow the procedure for an overload condition?
4. Trained the person conducting the triage and they were not ready yet?
These are the sensitive issues that a true RCA would seek to understand and uncover. This is the hard part of RCA, uncovering the truth. This is where most RCA’s lack depth and people prefer not to deal with these sensitive but absolutely necessary issues.
Think about it, if we choose to ignore these deeper issues (because it is easier and more comfortable to do so), then the ‘seeds’ of failure are still implanted in our systems. This just means they will be activated by someone else at a later time and the patient or operation will risk peril once again.
For RCA’s to be truly effective, we have to look in the mirror and face the possibility that we could have unintentionally contributed to the bad outcome…that is the only way we will make progress. This type of openness and non-punitive environment is a key principle of a High Reliability Organization (HRO).
Remember, “We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”
About the Author
Robert (Bob) J. Latino is former CEO of Reliability Center, Inc. a company that helps teams and companies do RCAs with excellence. Bob has been facilitating RCA and FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® methodology.
Bob is co-author of numerous articles and has led seminars and workshops on FMEA, Opportunity Analysis and RCA, as well as co-designer of the award winning PROACT® Investigation Management Software solution. He has authored or co-authored six (6) books related to RCA and Reliability in both manufacturing and in healthcare and is a frequent speaker on the topic at domestic and international trade conferences.
Bob has applied the PROACT® methodology to a diverse set of problems and industries, including a published paper in the field of Counter Terrorism entitled, “The Application of PROACT® RCA to Terrorism/Counter Terrorism Related Events.”
What’s Wrong With The Term “Root Causes”?
The Stigma of RCA: What’s in a Name?
Is the 5-Ys a Valid RCA Tool for Significant Events?
RCA in Action: The Space Shuttle Columbia Investigation
Root Cause Analysis Software
Our RCA software mobilizes your team to complete standardized RCA’s while giving you the enterprise-wide data you need to increase asset performance and keep your team safe.
Root Cause Analysis Training