Root Cause Analysis Without Verification is Conventional Wisdom
Root cause analyses are done every day all over the world. Organizations do root cause analysis because it is good business. From an explosion in an auto parts plant to the wrong prescription being given to a patient. Events such as these can have dramatic monetary, physical and psychological effects on an organization. To solve problems down to the true roots will reduce or eliminate the cause from ever happening again.
In many cases we interview the witnesses, collect some data, and get all of our experts together in a room to brainstorm the possibilities of how the incident could have occurred. When all the participants have come to a consensus on the most likely causes they begin to collaborate solutions that will eliminate the causes. With the solutions complete a plan to implement is developed, presented to the appropriate parties, and a paper is written to document the analysis. This technique of solving problems is usually ineffective in getting to the true root causes. In many cases you can get to the physical roots of a problem, this is usually the component that was broken. When you stop at the component failure you have completed a failure analysis not a root cause analysis.
I was recently talking with a client about root cause analysis. He was telling me that some of the problems they were encountering had clear solutions based on the data collected. He offered an example on a roller in the facility that had a hydraulic arm break. The reason for the failure was derived from interviews collected in the field. The hydraulic arm failed because the pillow block bearing bolts broke the bearing housing. Because of the production requirements the housing was welded in the field to maintain rate. Several weeks later the hydraulic arm on the opposite end failed. The determination was that it failed because the weld job left the hydraulic arm misaligned causing the opposite arm to operate in a bind. I thought that this was an interesting summation of the problem. I asked if they had looked into all the possibilities and the reply was that the evidence was so clear cut that there was no need to do a complete root cause analysis. I never heard him question why did the bearing-housing break in the first place? The analysis was based completely around the apparent causes. The solution will probably take care of the problem for a while but unless the true roots are discovered it will show up again.
There are ways that you can avoid a repeat of the same failure mechanism on the same piece of equipment. In the reliability world it’s called verification of the hypothesis. The PROACT® method of root cause analysis forces the user to look at all the possibilities as well as finding ways to verify each hypothesis using available technologies such as vibration analysis, metallurgical review, observation, eddy current testing, etc. There are enough technologies available that most hypotheses can be verified easily.
I have found that for material failures such as bearings, belts, fasteners, plastic conveyer parts, shafts, couplings, and many other components that the best and fastest approach to verification is to know how to read fractured surfaces. This skill can be acquired through trained professionals who teach fractology or you can hire these people to evaluate your failed parts. Knowing how to do this can reduce the time to do the root cause analysis because it leads you to understanding the physics behind the failure. Once the physics is clear the rest falls into place.
I have a client that learned this lesson on a chronic pump failure. The reliability team had been fighting two types of failures on this pump. The first mode was a broken shaft and the second was failed coupling bolts. This problem was on a critical piece of equipment and had the attention of the top management. The management was considering the use of capital money to replace the pump with a new one. The problem would surface in one of the two forms twice a month. The conventional wisdom of the current group working on the problem was that it was an overload problem. This came about as a result of a well-respected engineer in the plant. This person had a lot of successes to his name so no one questioned his judgment on the overload verdict. The team focused on hypotheses that could cause an overload situation.
After several months of costly trial and error the team wanted to explore a different approach. It was decided that the structure of a fault tree would help the team stay organized and cover more of the possibilities. During this period of time one of the team members attended a class that taught root cause analysis and how to read the fractured surfaces of failed parts. Within a few days of attending the class there was another failure of the pump. This time the team member looked at the failed parts differently and discovered that the failure was not an overload problem but a fatigue problem. The parts from previous failures were examined again and the examination proved again that fatigue was present. Management as well as the respected engineer questioned the new information. This now had become a political situation that put people’s credibility at stake.
The new information was used in creating different hypotheses about the modes of failure on the pump. One hypothesis came up that lead the team to a totally new focus of what could be causing the loss of the pump. Knowing now that fatigue was an issue the team questioned if there could be a base problem. The new hypothesis had to be verified. What technique could be used to accomplish this task? The team hired a professional to come in-house to perform an operating deflection shape test, this test would tell them if the motor and pump were running in phase together or if they were running out of phase. The test would show the relationship in movements of the motor to the pump to the base. The test was ordered at a political cost that could permanently damage the credibility of the team working on the analysis. If they did not learn the mode of failure from the test, a new pump would be ordered and the true roots never uncovered.
The test revealed that the base had some serious problems that caused the motor to the pump to be out of phase in turn causing a fatigue situation that was failing the pump. The base was repaired and the pump continues to run trouble free. If the management had replaced the pump as a solution it most likely would have been installed on the same base and continued to fail.
Without the structure of the tree and verification the analysis could have been a waste of time and paper. Root cause analysis is a paradigm shift for most organizations. It is much easier to make reasonable assumptions and replace parts than it is to collect data, review the data, analyze the data, verify the hypotheses, write a report, implement recommendations, and track the success.
The structured method will always get you the right answers because it forces you to look at all the possibilities and verify them. It may take more of your time but will reduce the overall workload in your facility.
About the Author
Mark Latino is President of Reliability Center, Inc. (RCI). Mark came to RCI after 19 years in corporate America. During those years, a wealth of reliability, maintenance, and manufacturing experience was acquired. Mark worked for Weyerhaeuser Corporation in a production role during the early stages of his career. He was an active part of Allied Chemical Corporation’s (Now Honeywell) Reliability Strive for Excellence initiative that started in the 70’s to define, understand, document, and live the Reliability culture until he left in 1986. Mark spent 10 years with Philip Morris primarily in a production capacity that later ended in a Reliability engineering role. Mark is a graduate of Old Dominion University (ODU) and holds a BS Degree in Business Management that focused on Production & Operations Management.
Recent Posts
A Step-by-Step Guide to Using Root Cause Analysis Tools for Improved Reliability
How to Choose the Right Root Cause Analysis Tool for Your Reliability Program
How to Perform Root Cause Investigations?
Post-Incident Analysis for Enhanced Reliability
Root Cause Analysis Software
Our RCA software mobilizes your team to complete standardized RCA’s while giving you the enterprise-wide data you need to increase asset performance and keep your team safe.
Request Team Trial
Root Cause Analysis Training
Your team needs a common methodology and plan to execute effective RCA's. With both in-person and on-demand options, our expert trainers will align and equip your team to complete RCA's better and faster.
View RCA Courses
Reliability's root cause analysis training and RCA software can quickly help your team capture ROI, increase asset uptime, and ensure safety.
Contact us for more information: