How to Perform Root Cause Investigations?

Updated: August 28, 2024

Reading Time: 6 minutes

Resources

When failures occur, they not only disrupt productivity but can also lead to significant costs and safety risks. To prevent these issues from recurring, it’s essential to identify the root cause of the problem. Root Cause Investigation (RCI) is a powerful tool that helps organizations get to the heart of failures, ensuring long-term reliability and operational excellence.

The purpose of investigation is to determine the root cause of existing or potential nonconformities, whenever possible, and to provide recommendations of solutions. The magnitude/scope of the investigation should be commensurate with the determined risk of the nonconformity. 

What is a Root Cause Investigation?

A Root Cause Investigation is a methodical process used to uncover the underlying reasons for failures or problems within a system. Unlike quick fixes that address only the symptoms, RCI seeks to identify and eliminate the root cause, preventing the issue from happening again. This proactive approach is vital for improving system reliability, safety, and efficiency.

Without RCI, organizations may find themselves repeatedly dealing with the same issues, leading to unnecessary downtime, increased costs, and potential safety hazards. By thoroughly investigating the root cause of a problem, companies can implement permanent solutions, resulting in:

  • Reduced Downtime: Addressing the root cause helps prevent repeated failures, ensuring smoother and more reliable operations.
  • Cost Savings: Avoiding recurring problems reduces the need for frequent repairs and associated costs.
  • Improved Safety: Eliminating the root cause of failures minimizes the risk of accidents and enhances workplace safety.

How to Get Started With Root Cause Investigation

It is considered best practice to establish a documented plan before initiating any investigation. This plan should cover the following key elements:

  • A clear problem statement describing the nonconformity
  • The scope of the investigation
  • The investigation team and their respective responsibilities
  • A detailed description of the activities to be undertaken
  • Required resources
  • Methods and tools to be used
  • The timeframe for completion

From the information obtained throughout the process the problem statement should be reviewed and refined as appropriate. The investigation should:

  • Determine the full extent of the nonconformity or potential nonconformity.
  • Recognize that an event may have multiple causes, and therefore, the investigation should not be concluded prematurely.
  • Differentiate between symptoms and root causes, emphasizing the need to address root causes rather than just the symptoms.
  • Define a clear endpoint for the investigation to avoid unnecessary delays or additional costs. For example, if addressing the causes identified so far can correct 80% of the effects, it is likely that the significant causes have been identified (Pareto principle).
  • Consider the outputs from relevant risk management activities.
  • Agree on the form of evidence required, ensuring it supports:
    • The seriousness of the event
    • The likelihood of the event occurring
    • The significance of the consequences resulting from the event

Data collection should be part of the investigation to facilitate analysis. The investigation should build on any prior analysis, evaluation, or investigation (as referenced in section 5.0). This will require the investigator to identify, define, and further document the observed effects/nonconformity or the already determined causes to fully understand the context and extent of the investigation. It may be necessary to:

  • Review and clarify the information provided
  • Analyze any additional information available through horizontal analysis
  • Determine whether the issue is systemic or non-systemic
  • Gather further evidence, if required
  • Interview process owners, operators, or other involved parties
  • Review relevant documents
  • Inspect the facilities or the environment where the event occurred
image

Past investigations should also be reviewed to determine whether the event is a new problem or a recurrence of a previously identified issue, possibly due to the implementation of an ineffective solution. The following questions can assist in making this determination:

  • Is the nonconformity identified from a single data source?
  • Does the current nonconformity correlate with issues identified in other data sources?
  • Are multiple data sources pointing to the same nonconformity?
  • Do other nonconformities influence the problem currently under investigation?

Many investigation tools are based on identifying the cause-and-effect relationship between an event and its symptoms. To ensure that the investigation identifies causes rather than symptoms, the following should be considered:

  • Provide a clear description of each cause and its effect, ensuring the link between the cause and the undesired outcome is well defined.
  • Describe the combined conditions that contribute to the undesired effect for each identified cause.
  • Consider a failure to act as a cause only if there was a pre-existing obligation to act, arising from procedures, regulations, standards, guidelines, or other reasonably expected actions.
image 1

A failure to act is only considered a cause if there was a pre-existing requirement to act. The requirement to act may arise from a procedure, or may also arise from regulations, standards or guidelines for practice, or other reasonably expected actions.

Some of the more common tools and techniques include:

  • Cause and effect diagrams
  • 5 Why’s analysis
  • Pareto charting
  • Fishbone/Ishikawa cause and effect diagrams
  • Change analysis
  • Risk analysis techniques

The outcome of an investigation should include:

  • Clearly defined problem statement
  • What information was gathered, reviewed and/or evaluated
  • Results of the reviews/evaluations of the information
  • Identification of cause(s) or contributing factors
  • Solutions to address the cause(s) or contributing factor(s) 

Step-by-Step Approach to Root Cause Investigation

In line with the steps we mentioned earlier, to effectively conduct a Root Cause Investigation, follow these structured steps:

image
  1. Define the Problem
    Clearly identify and describe the issue at hand. This involves gathering information about what happened, when it happened, and the impact it had on operations. The goal is to create a precise problem statement that guides the investigation.
  2. Collect Data
    Gather all relevant data related to the problem. This includes operating conditions, maintenance records, logs, and any other documentation that could provide insights into the failure. Accurate data collection is crucial for a thorough investigation.
  3. Identify Possible Causes
    Use tools like brainstorming, Fishbone (Ishikawa) diagrams, or the 5 Whys technique to explore all potential causes of the problem. Encourage team members to think broadly and consider all possible factors that could have contributed to the issue.
  4. Analyze the Causes
    Narrow down the list of possible causes to identify the most likely root cause(s). This step often involves further data analysis, testing, or simulation to confirm the findings. The aim is to pinpoint the exact cause that, when addressed, will prevent the problem from recurring.
  5. Develop Solutions
    Once the root cause is identified, brainstorm and evaluate potential solutions. Choose the solution that best addresses the root cause and is feasible to implement. Consider factors such as cost, time, and the potential impact on operations.
  6. Implement the Solution
    Put the chosen solution into action. This may involve changes to processes, equipment modifications, or updates to training programs. Ensure that the implementation is carried out effectively and that all stakeholders are informed.
  7. Monitor and Verify
    After the solution is implemented, closely monitor the situation to ensure that the problem is resolved. Verify that the root cause has been effectively addressed and that no new issues have arisen. This step is crucial for ensuring the long-term success of the solution.
  8. Document and Share Findings
    Document the entire investigation process, including the problem definition, data collected, analysis, solutions, and results. Share these findings with relevant stakeholders to ensure that the knowledge is retained and can be applied in future situations.

Conclusion

Root Cause Investigation is a critical component of maintaining and enhancing the reliability of your operations. By following a structured approach to identify and eliminate the root causes of problems, you can achieve significant improvements in uptime, safety, and cost-effectiveness. To further enhance your RCA processes, consider exploring these 3 steps to improve root cause analysis, which can provide additional insights and practical strategies. Investing time and resources into RCI is not just about solving today’s problems; it’s about building a stronger, more resilient organization for the future.

___________

I hope you found this article insightful and actionable! Stay tuned for more thought-provoking articles as we continue to share our knowledge. Success is rooted in a thorough understanding and consistent application, and we hope this article was a step in unlocking the full potential of Root Cause Analysis for your organization.

Reliability runs initiatives such as an online learning center focused on the proprietary PROACT® RCA methodology and EasyRCA.com software.

Root Cause Analysis Software

Our RCA software mobilizes your team to complete standardized RCA’s while giving you the enterprise-wide data you need to increase asset performance and keep your team safe.

Request Team Trial

Root Cause Analysis Training

Your team needs a common methodology and plan to execute effective RCA's. With both in-person and on-demand options, our expert trainers will align and equip your team to complete RCA's better and faster.
View RCA Courses

Reliability's root cause analysis training and RCA software can quickly help your team capture ROI, increase asset uptime, and ensure safety.
Contact us for more information: