Defining Root Cause Analysis (RCA)

Updated: May 2, 2024

Reading Time: 7 minutes

Articles

Contents hide

RCA DEFINITIONS
INTERPRETATION OF RCA DEFINITIONS
FIVE SIMPLE COMPONENTS OF ROOT CAUSE ANALYSIS:
CONCLUSION

In this blog, I discuss the RCA definition I used as a career analyst/investigator and explain why it makes sense to me. You decide if it makes sense in your facility.

RCA is so ill-defined that no matter what people use to solve problems at their facilities (i.e.- troubleshooting, brainstorming, problem-solving or scribbling on a bar napkin)…they will call it ‘RCA’. As a result, in the minds of leadership, all ‘RCA’ approaches are often viewed as equal.

RCA DEFINITIONS

Let me prove my point. I did a quick Google search on RCA and found 3 varying RCA definitions from credible sources

1. WIKIPEDIA:

In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems.

2. ASQ:

Root cause analysis (RCA) is defined as a collective term that describes a wide range of approaches, tools, and techniques used to uncover causes of problems. Some RCA approaches are geared more toward identifying true root causes than others, some are more general problem-solving techniques, and others simply offer support for the core activity of root cause analysis.

3. DEPT. OF ENTERPRISE RISK MANAGEMENT

Root cause analysis (RCA) is a systematic process for identifying “root causes” of problems or events and an approach for responding to them. RCA is based on the basic idea that effective management requires more than merely “putting out fires” for problems that develop but finding a way to prevent them.

From these RCA definitions, we see how wide the range of interpretations is – some are very generic, and others are more specific. But even more noteworthy, we see how ‘RCA’ can be legitimately misinterpreted, to the RCA analysts’ advantage.

For instance, in these definitions, we hear words like:

Uncover causes
Problem-solving
Root causes
Faults
Problems

INTERPRETATION OF RCA DEFINITIONS

All of these terms can be interpreted to be compliant with these RCA definitions.

What if we had an unexpected outage due to a pump failure? Our ‘RCA’ team concluded ‘the’ root cause was a fatigued bearing (physical root or PH). Their corrective action was to replace the bearing using another manufacturer or a different type of bearing. Would that be a compliant ‘RCA’ given these definitions? Perhaps.

What if in the same case, we drill a little deeper and find that the fatigue is due to an original misalignment issue (human root or HR)? Would that be classified as a compliant ‘RCA’ given these definitions? Perhaps.

Drilling a little deeper, we strive to understand why someone would align the way they did (latent root or LR). We find in our evidence-based search

The procedures they were using were obsolete
The alignment tools they were using were inappropriate
The mechanic was not trained properly in current alignment best practices
The supervisor was not aware the mechanic was unqualified to do the alignments.

In each of these cases, would they be compliant with the above RCA definitions? Likely so because the definitions are vague enough to be left to interpretation. We have to be realistic that there is no perfect RCA definition, and all will be left to some level of misinterpretation. The best we can do is minimize the risk of misinterpretation while maintaining the effectiveness of the RCA.

Definition of Root Cause Analysis

“The establishing of logically complete, evidence-based, tightly coupled chains of factors from the least acceptable consequences to the deepest significant underlying causes.”

FIVE SIMPLE COMPONENTS OF ROOT CAUSE ANALYSIS:

Even when I look at this definition, it looks too complex and it uses intimidating engineering jargon. So, I simplified it. I broke it down into five simple components for us each to remember:

Logically Complete
Evidence-Based
Tightly Coupled
Least Acceptable Consequences
Deepest Significant Underlying Causes

Let’s look to Figure 1 to dissect this definition. Consider this logic tree as a graphical reconstruction of an undesirable outcome of some kind. (Full disclosure, it is just an abbreviated view for example’s sake, and is not as linear in true practice.

The ‘Event’ is the undesirable outcome. The ‘Modes’ are the facts accumulated from the scene that need to be explained. The ‘Hypotheses’ (H) reflect the exploration of logic to explain the Modes (the facts).

When we drill down past the Modes, we are exploring the physical nature of the failure or the ‘failure physics’. We lead this exploration with the question ‘How Can?’

Figure 1. Basic Logic Tree Representation

This brings us to our 1st RCA definition component:

Logically Complete:

This is the difference between asking ‘How Can’ and ‘Why’.

Think about this using a detective metaphor…if we ask ourselves ‘How a crime occurs’ versus ‘Why a crime occurs’, wouldn’t the answers be different? The use of ‘How Can’ to explore the physics of failure is appropriate because the physical sciences tend to have a more finite range of possibilities (i.e. – how can fatigue occur). We are not seeking just one, linear answer, we are seeking all the possibilities that could have occurred. This is because failure is not linear and normally multiple failure pathways converge at some point in time to cause a bad outcome.

As RCA analysts we are continually, visually recreating the events that occurred in our minds. This is just like the flashbacks we see on crime shows like CSI, where they play out a hypothesis. As RCA analysts, we are doing the same thing, like rolling back a video recording of the failure in short increments of time.

Notice, when we arrive at the decision maker (HR), our question shifts from ‘How Can’ to ‘Why’. I am not interested in ‘How Could’ someone make a decision, as the potential answers are infinite. I am interested in why , at that time, their decision seemed appropriate. That is very specific reasoning. When at this point in the logic tree, the decision reasoning point, we switch from using deductive logic to inductive logic.

Evidence-Based:

Validating all hypotheses using sound evidence as opposed to hearsay.

How likely is it for a lawyer to win their case in court when their primary evidence is hearsay? How often do we see RCA’s presented to us (or leadership) where they are full of assumptions and hearsay? To me, if we do not have sound evidence to back up our hypotheses, IT IS NOT AN RCA! This is a critical element that is often missed because most of us are time-pressured to complete our RCAs. What takes the most time when doing an RCA…collecting the evidence!! Therefore, when that time pressure is applied, we are forced to take shortcuts, usually in the form of not properly validating our hypotheses.

In an effective RCA, each hypothesis has a verification log entry that includes a:

Verification Method
Verification Outcome
Person Responsible
Date Due/Date Complete

This is basically a ‘chain-of-custody’ approach applied to an RCA.

Tightly Coupled:

The utilization and expression of linear/non-linear, cause-and-effect logic.

In Figure 1, when reconstructing logic, level-to-level represents a cause-and-effect relationship. That is what ‘tightly coupled’ means; we can directly correlate logic from one level to another. So, with evidence, I can link the deficient management system (LR) to their direct influence on the decision-maker’s decision (HR). Then I can link the consequences of the decision to their physical, observable impacts (PR)…or to the physics of failure.

This ‘coupling’ is different from categorical RCA Approaches like fishbone diagrams (see Figure 2) that use cause categories to explore. Within these categories, we are encouraged to brainstorm what cause could have contributed to the overall undesirable outcome. Using this approach does not reflect, direct cause-and-effect relationships.

Figure 2. Conventional 6M Fishbone Diagram Expression

Least Acceptable Consequences:

The threshold or trigger which initiates an RCA in an organization.

This will vary from company to company based on different drivers such as regulatory and internal KPI’s. However, such triggers are usually reactive in nature as the thresholds involve serious consequences like significant losses, injuries/deaths, and regulatory violations.

I would encourage analysts to lower the RCA triggers to include proactive opportunities, such as:

Chronic failures with an annual cost exceeding X dollars
Near misses with potentially severe outcomes
Unacceptable risks generated from credible risk assessments such as FMEA’s.

Deepest Significant Underlying Causes:

The point at which drilling down in an RCA ceases to be value-added.

This is a question we often hear in training, ‘When do we stop drilling down?’. It is a fair and legitimate question because one can take an analysis back to Adam and Eve if they want, but at what point is it non-value added?

My simple rule of thumb is that when the corrective actions involve going outside the fence (the boundaries of the facilities) we may not have control of the fix. We can control fixing systems in our organizations, but we can’t control socio-technical factors like changing regulations and laws. That doesn’t mean they shouldn’t be addressed, but it does mean that we can hand off those corrective actions to people that can effectively address them. We want to focus on implementing fixes that we can control.

CONCLUSION

There are many definitions of RCA in the marketplace and many of them have different purposes. For instance, regulatory definitions drive compliance. Vendor’s definitions often have a commercial variant related to their proprietary approach. Plus there are even more definitions from researchers and academics that often reflect theory versus practice. So, it’s a buyer beware market where ‘RCA’ is concerned. In the end, use what works best for you!

Our recommendation is the focal point of every RCA effort be its EFFECTIVENESS. Unfortunately, just being RCA compliant does not mean your RCA effort is effective. Those in the field know what I’m talking about.

We at RCI utilize this particular RCA definition, because we feel it represents the effectiveness of a holistic RCA system. It is not related to our commercial products alone, as it can be applied to any form of ‘RCA’.

Collectively, as RCA professionals, we must unite to defeat this paradigm:

‘We NEVER seem to have the time and budget to do RCA right, but we ALWAYS seem to have the time and budget to do RCA again’!

About the Author
Robert (Bob) J. Latino is former CEO of Reliability Center, Inc. a company that helps teams and companies do RCAs with excellence. Bob has been facilitating RCA and FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® methodology.

Bob is co-author of numerous articles and has led seminars and workshops on FMEA, Opportunity Analysis and RCA, as well as co-designer of the award winning PROACT® Investigation Management Software solution. He has authored or co-authored six (6) books related to RCA and Reliability in both manufacturing and in healthcare and is a frequent speaker on the topic at domestic and international trade conferences.

Bob has applied the PROACT® methodology to a diverse set of problems and industries, including a published paper in the field of Counter Terrorism entitled, “The Application of PROACT® RCA to Terrorism/Counter Terrorism Related Events.”

Get Bob’s Newest Book Here!