Understanding the reasons behind a company’s struggles isn’t always about not knowing how to fix them. Often, the true origins of these problems are hidden, and that’s why conducting a root cause analysis is so important for good leadership within an organization. When I give speeches around the world, I often poll my audiences about how they define ‘RCA’. The fact is, I get as many answers, as the people I ask!
Are there definitions out there? Absolutely – there are hundreds of them. But you don’t need to read about all that. Here’s what you need to know:
Three Categories of Root Cause Analysis
1. The Latent Root Causes
Management (or organizational) systems are the rules and guidelines in which our organizations operate, much like the laws of our lands that govern how our countries operate. Since these are created and maintained by humans, they are not flawless. They can be insufficient, inadequate, wrong and even non-existent (for situations unforeseen). We refer to these management system flaws as Latent Root Causes.
Examples of Latent Root Causes: Inadequate torque wrench calibration system, the accountability system was weak and rarely enforced, training was less than adequate, etc. show that latent root causes are always there, lying dormant, and waiting to be activated by the human.
2. The Human Root Causes
With flawed management systems, we feed incomplete and/or inaccurate information to people who must process the information to make their decisions. Ultimately, this will likely result in an inappropriate decision for the situation at hand! We refer to these ‘decisions’ or ‘choices’ as Human Root Causes.
When humans make an inappropriate decision, it is expressed in one of two ways, 1) errors of commission or 2) errors of omission. This means we took an action that was inappropriate (error of commission) or we should have taken an appropriate action and didn’t (error of omission).
Examples of Human Root Causes: There are countless examples of human root causes. For instance, a mistake of commission could involve closing a valve in a manufacturing operation when it should have been kept open. On the other hand, an error of omission might occur when an emergency room nurse incorrectly prioritizes a patient, leading to the tragic outcome of the patient passing away while waiting for care in the waiting room.
3. Physical Root Causes
When humans make decision errors, they often result in observable consequences. At this point, the error chain has not been obvious because it is still in the mind of the decision-maker. Only after the decision is made, are the consequences observable. We will refer to these consequences as Physical Root Causes.
Examples of Physical Root Causes: These observable consequences include physical signs of failure, such as a worn-out part, exposure to harmful substances, or a corroded pipe.
How to Perfrom Root Cause Analysis?
An RCA myth encountered by many managers is they think RCA methods are all the same, when in fact they are NOT. Many RCA techniques have little to no emphasis on establishing all the possible ways a problem can occur, while others expand the user’s overall effectiveness by having pre-built logic templates to ensure what is often viewed as having all the possibilities of occurrence available for discovery. Verification of each possibility can also range from someone saying it happened (weak verification) to re-construction and testing of each possibility (strong verification).
RCA methods can be shallow, or they can be robust, it depends on what the management wants to accomplish. Here are some analytical methodologies that are often grouped into the category of RCA:
When something ‘fails’ in our workplaces, do faces come to mind of the people that we immediately turn to, to make everything alright? These are our ‘heroes’ who get us back to normalcy quickly, and they get a fair amount of recognition for doing so. However, these individuals are being provided positive recognition for being great responders. They are rarely doing any analytics to understand root causes, but they are great at implementing temporary solutions. A progressive management team, under these conditions, would be asking, “Why is this person getting so much practice at responding?” That’s where the meat is. This approach is normally attractive because it is quick, inexpensive and demonstrates immediate action.
We can all relate to this analytical technique. When something bad happens, we tend to put very smart people in a room who listen to a summary of what we know happened (the bad outcome). Oftentimes this description embeds hearsay as fact, and the group is accepting of this hearsay as they move on to solutions. So many bright people throw out disconnected ideas as to what they think happened, and then they move on to action plans. Usually this approach focuses on speed to demonstrate activity, and as a result is weak on evidence and analysis down to root causes. Chances are this team will be meeting again because the failure will recur. This approach takes a little longer because we are not dealing with an individual, we are dealing with several individuals so this requires more give-and-take in discussions.
3. Problem Solving
This approach is very common and it is essentially brainstorming plus a structured analytical tool like 5-whys or a fishbone diagram. The ‘tool’ provides a degree of discipline as it has a series of steps and a structure to follow. This is certainly more progressive than troubleshooting and brainstorming.
This approach, when applied properly, could be effective. However, in my 35+ years of this business, I find that this approach is not always applied properly. People are usually time constrained to hurry up, and therefore the time to collect evidence is expensed. So we run with hearsay and assumption, we treat them as facts, and we develop and implement recommendations accordingly.
Dick Swanson (Owner, Performance Management Initiatives, Inc.) says, “The irony of this association is rooted in the fact that the 5-Why approach was developed by Toyota as a tool for assembly floor supervisors to keep production moving, and not as a tool to identify deep, underlying causes of complex events”.
I will add another potential form of RCA, used by many in industry, which is ‘Trial and Error’. I do not list it with the others because I really don’t consider it an analytical process. This approach just supports the paradigm of ‘if it ain’t broke, don’t fix it!’ This is more akin to applying a crisis maintenance strategy; there really is no analytics going on, we just fix things when they break.
What Constitutes a ‘Valid’ RCA?
The intent of “true” Root Cause Analysis is to mitigate or eliminate the possibility of recurrence. For this to take place the methodology used must have a problem definition that is accurate and factual. Possible ways the problem can occur must be identified, and each possibility must be verified as true (did happen) or false (did not happen) using sound evidence (not hearsay).
To look at RCA agnostically, getting away from brands and labels, let’s briefly explore what core steps constitute a valid RCA. I suggest the following:
- Utilization of a disciplined evidence-gathering approach, that includes, identification of relevant evidence to collect, preservation of such evidence in the field, defined strategy to collect such evidence and development of a plan for storing and managing such evidence
- Converting the evidence to useful information (i.e. – qualification, validation and verification [QV&V]),
- Mitigating/minimizing potential biases of team leader and team members,
- Creation of an efficient and effective means to graphically express and communicate the reconstruction of the Event, clearly identifying proven causal factors,
- Ensuring the proper and timely implementation of approved corrective actions,
- Tracking effectiveness of implemented corrective actions against measurable and meaningful bottom-line metrics, both leading and lagging indicators,
- Aggregating learning from RCA’s across a corporation and cultivating and growing an RCA knowledge management system,
- Leveraging learning from successful RCA’s across an organization to prevent recurrence
How to Convert a Shallow Cause Analysis into a Root Cause Analysis
1. Allow Proper Time
Time pressure has a huge impact on the effectiveness of an RCA. When anyone is time pressured to do anything, they will often take short-cuts. In the RCA world, the short-cuts are likely to take place from the most time-consuming part of an analysis/investigation, which is the data collection phase. So when we take short-cuts on gathering our evidence, we increase the risk of recurrence because we are not operating on facts, just hearsay and assumptions.
In summary, the faster we do an analysis due to time pressures, the more likely we are to do the same analysis again. This is because we are likely to be weak on evidence and our focus is on solutions and not analysis. When RCA is done properly, it does take more time to conduct an effective analysis. However, we should not have to do it again if we did it right the first time!
2. Position The Right People
Many managers underestimate the amount of support needed for a successful RCA system. The paradigm of, “Send the candidates to RCA training and they will solve problems,” rarely works.
The RCA infrastructure is often not well thought out and when practitioners encounter obstacles, they are not able to complete their RCA successfully. This usually results in abandonment of the practitioner’s internal drive to execute the process correctly.
There are common barriers encountered by newly trained analysts. The student/analyst:
- has not performed an RCA right away (it may be months later) and they forget how to perform an RCA as they learned it in class.
- will get the problem definition wrong (use hypotheses as factual modes) and end up with a disconnected analysis.
- will not know whether a “trigger” has been reached until weeks later and the event data is no longer available (cannot solve problems without data).
- will ask to have evidence analyzed by a third party and there will be no budgeted funding available.
- will have a deadline for completing the RCA and may cut corners to meet the deadline (some as little as 48 hours).
- has no experience performing RCA’s (student does not know what success looks like).
- has no mentor available to review analyses and give feedback.
- is not able to implement recommendations (low priority in work order system).
- Is not able to track the results of implemented recommendations.
- cannot prove ROI of RCA effort to leadership (cannot make the business case).
How To Make Your RCA’s More Effective
1. Evaluating Yourself
Now that we understand where a failure comes from and how the error chain grows, how can we make our RCA processes more effective? Why do we often seem to be doing RCA on the same events, over and over again? Are we not learning from the past? Is it that our RCA’s just aren’t that good?
Having been an RCA practitioner now for over 35 years working in various industry sectors, my observation is that we have a difficult time looking in the mirror and accepting that we could be part of the problem!
Many organizations seem content with their RCA processes, when their analyses pass some kind of regulatory audit. Passing such an audit or survey means the regulators are off their backs…for now.
However, that is not the true measure of RCA effectiveness and it is misleading. RCA effectiveness should be measured based on quantifiable and meaningful bottom-line metrics that correlate to corporate dashboards or KPIs.
The key to RCA effectiveness is facing the truth, and unfortunately, we are not very good at accepting the truth when it involves our taking an introspective look at our potential contribution to the bad outcome.
The ‘truth’ is embedded in the management systems we spoke about earlier. Oftentimes we focus on the decision-makers and then levy discipline for making a poor decision. However, RCA is not about ‘who’ made the poor decision. We are more interested in why the person felt his or her decision was appropriate at the time.
When we get into the decision-maker’s head and understand their reasoning for their decisions, most of the time their rationales are perfectly logical. Their decisions are most often well-intended. More importantly, others would likely make the same decision given the same information and under the same conditions.
When we delve this deep, this will bring us right back to the flawed management systems that provide these people such information. These systems are supposed to be in place to help our people make better decisions. So when they are flawed, our systems are at risk of not performing as intended.
Think about it, if we choose to ignore these deeper issues (because it is easier and more comfortable to do so), then the ‘seeds’ of failure are still implanted in our systems. This just means they will be activated by someone else at a later time and the patient in the hospital, or operation, will risk peril once again.
For RCA’s to be truly effective, we have to look in the mirror and face the possibility that we could have unintentionally contributed to the bad outcome…that is the only way we will make progress. This type of openness and non-punitive environment is a key principle of a High Reliability Organization (HRO).
Remember, “We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”
2. Evaluating Your RCA
At this point our RCA is completed and now we have to develop, sell and implement our solutions. Remember, RCA is a ‘system’ and not a task. This is yet another critical link in the RCA chain. This is because if we can’t sell the need for our recommendations, all the investigative and analytical work we did was a waste of time (plus we would be less driven to do a great job next time).
As analysts, we have to ask ourselves ‘What is our definition of success?’ for our analysis. Compliance should NOT be the definition of success for an RCA.
Conducting such an assessment on an annual basis will allow us to measure our progress. Such an assessment will identify which sections we are strong in, as well as where we could use improvement. In our weaker areas, we can take corrective actions to shore up the RCA system and help our analysts’ be the best they can be.
3. Helpful Tips
Management can increase the success rate of RCA’s by making sure the infrastructure is in place to:
- provide performance criteria
- provide reasonable time for analysis
- process the recommendations
- remove barriers
- provide technical support
- provide skill-based training
- provide IT support
- create committed RCA teams
- provide effective leading and lagging metrics and a means to track implemented solutions
How RCAs Contribute to the Bottom-Line
In order for an RCA to be successful, there has to be some type of bottom-line improvement. Something has to get better as a result of your RCA, what is that? Simply clicking a checkbox from a list indicating your RCA is complete, is not a measure of success (or shouldn’t be). That just means the determination of causes may be complete, but we still have nothing to show for it on the bottom-line.
Most RCA’s tend to drop off a cliff at this point, because there is a lack of accountability for the recommendations. Each recommendation should have a person assigned to ensure each is completed, with a due date. Each recommendation should have a cost/benefit calculation attached to it, to measure ROI. This will greatly aid in the selling of the recommendation to finance people.
To complete our loop with RCA being viewed as a ‘system’, closure will be that a measurable, demonstrable benefit has been realized. This means that we have to have tracking mechanisms in place to measure the effectiveness of each recommendation and for the RCA overall.
Rounding out our RCA system, these are some tasks that we should be concerned about when it comes to measuring effectiveness:
As part of our RCA management support systems, Leadership should tell us what their expectations are for the RCA initiative. Oftentimes this is correlated to the corporate dashboards and/or KPIs. We should be able to demonstrate that our RCA’s are narrowing the gaps of such corporate metrics.
1. Are systems in place to oversee if assigned tasks are actually being implemented?
Without such oversight, if someone is not doing their task and there is no negative consequence, they likely never will. They have other priorities and this one may be low, especially when no one is checking to see if the task was done.
2. Are systems in place to ensure the results of the RCA are shared across the organization so that others can learn from them?
As mentioned earlier, one of the greatest benefits of an effective RCA system, is the creation of a living and growing knowledge management system. This would be a database of RCA experience, or ‘corporate memory’. This would prevent people from having to do the same RCA’s over and over again, just because they did not know one had been done in the past. Imagine the costs of re-work when we have to do RCA’s over and over again. Who’s calculating that number in an organization?
3. Are we reporting our RCA ROI’s back to our Leaderships to justify the existence of our RCA initiative?
I can assure you as a CEO myself, if I see such initiatives saving my company millions of dollars/year, I will continue to invest in such initiatives. As an FYI, our documented average ROI for our case study database is over 600% (as published in our books). That will raise the brow of any finance person.
4. Last but not least, are we reporting our RCA results back to those in the field who provided input to the analyses?
If not, we should be. This is because they will see, they were part of something successful and they will be motivated to continue to help in the future as well.
In the end, an analysis is only as good as the analyst!
About the Author
Robert (Bob) J. Latino is former CEO of Reliability Center, Inc. a company that helps teams and companies do RCAs with excellence. Bob has been facilitating RCA and FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® methodology.
Bob is co-author of numerous articles and has led seminars and workshops on FMEA, Opportunity Analysis and RCA, as well as co-designer of the award winning PROACT® Investigation Management Software solution. He has authored or co-authored six (6) books related to RCA and Reliability in both manufacturing and in healthcare and is a frequent speaker on the topic at domestic and international trade conferences.
Bob has applied the PROACT® methodology to a diverse set of problems and industries, including a published paper in the field of Counter Terrorism entitled, “The Application of PROACT® RCA to Terrorism/Counter Terrorism Related Events.”
Understanding Failure Mechanisms and Components
What’s Wrong With The Term “Root Causes”?
The Stigma of RCA: What’s in a Name?
RCA in Action: The Space Shuttle Columbia Investigation
Root Cause Analysis Software
Our RCA software mobilizes your team to complete standardized RCA’s while giving you the enterprise-wide data you need to increase asset performance and keep your team safe.
Get Free Team Trial