logo2008

Training


Root Cause Analysis for Customer Reported Problems

Of all the kinds of problems that software development organizations face, Customer Reported Problems (CRPs) are clearly the most important. This is because CRPs represent potential gaps in your knowledge of how your customers use your software. CRPs may be the result of deficiencies in your development, test, delivery, or fulfillment processes. CRPs often result in disruptive, expensive, and unplanned releases.

When CRPs are not fully understood, they can result in poor solutions that often create more problems than they solve. Nothing frustrates customers more than a supplier who is unable to resolve problems quickly and correctly. Finding critical defects in your software is very disruptive not only for your customers but for your software development organization as well. Unplanned releases to fix CRPs divert expensive development resources from tasks that generate revenue (new features, new products, etc.) to tasks that don't generate revenue (bug fixes). Unplanned releases are clearly not good for your bottom line.

CRPs represent more than just defects. CRPs should be broadly defined to include any failure of software and services (including code, documentation, installation, customization, fulfillment, training, etc.) that negatively impacts customers.

Root Cause Analysis is routinely used to investigate the cause of major disasters including:
  • Airline crashes
  • Space Shuttle accidents
  • Chemical and nuclear plant disasters
Root Cause Analysis helps us:
  • understand causes of customer dissatisfaction
  • understand the what, the why, and the how…
  • reduce rework by preventing recurrence
  • identify process weaknesses
  • improve customer satisfaction

Root Cause Analysis Process Overview

The Root Cause Analysis Process consists of investigating, understanding, and categorizing underlying root causes of observed events. It can be best performed by a small cross-functional team and can be easily incorporated into your Defect Triage Process.

The Root Cause Analysis Process includes a detailed analysis based on gathering factual information obtained from:

  • Available documents and records
  • Interviews with staff and customers
  • Brainstorming sessions with staff

The Root Cause Analysis Process uses simple tools including:

  • Why Trees
  • Pareto Analysis

An effective Root Cause Analysis Process helps determine appropriate and effective corrective actions by identifying both an Immediate Corrective Action (what should be done today to resolve the CRP) and Long Term Corrective Action (what should be done to prevent recurrence).

In applying the Root Cause Analysis Process, the Triage Team starts with a specific CRP and asks:

  • What is it about the way we operate that allowed this CRP to occur?
Most root causes are found in the way we operate. That includes:
  • Who does what?
  • How things get done?
  • Why we behave way we do?

The Triage Team asks questions about "Who does what", "How things get done", and "Why we behave the way we do", in order to identify factual information that can be helpful in identifying real root causes.

In asking these questions, the Triage Team uses a tool called the Why Tree. Why Trees are similar to Fault Trees in that the event of interest (CRP) is placed at the top. We then ask "Why did this happen?" and start drilling down into "Who does what", "How things get done", and "Why we behave the way we do". At each level, the team continues to ask "Why" - usually at least five times (though for simpler problems, less than five Whys may suffice).

The following illustrates a partially completed Why Tree for a simple problem:

Why Tree

Answers to Why questions may need to be determined from documents (like Functional Specifications, Test Plans, User Manuals, etc.), from records (like test results, shipping invoices, etc.), from interviews with staff and customers, and from brainstorming sessions.

The information shown in green circles on the Why Tree example represents probable root causes. The Triage Team reaches consensus on the most probable root cause(s). Often, there will be more than one root cause.

Using the Why Tree, the Triage Team develops an Immediate Corrective Action (which could be a workaround, hot fix, patch, new CDs, new doc, etc.). The team also identifies effectiveness checks that can determine if the Immediate Corrective Action, once implemented, has effectively resolved the CRP.

Once the Immediate Corrective Action is implemented and the effectiveness checks are satisfactory, the Triage Team decides if a Long Term Corrective Action is needed. A Long Term CA would be appropriate if the root cause points to systemic problems. If so, they begin to develop a Long Term Corrective Action. The team does this by:

  • Reviewing existing processes and procedures
  • Identifying process weaknesses directly related to root cause
  • Identifying potential process and procedure changes
  • Identifying long term effectiveness checks

Once the team has competed work on the Long Term Corrective Action, it can be presented to Management and implemented. The team then collects data to determine if long term effectiveness checks are satisfactory.


Intended Audience

The intended audience for this workshop includes Project Managers and Triage Team members, including QA, Development, and Technical Support. Project Teams should attend this training together as a team, if possible. Once a Triage team has been trained, I frequently facilitate the first few Triage Meetings where Root Cause Analysis is involved.


Tailoring

tailor This workshop can be tailored to meet your specific project needs and development process.

Call for details...




For further information,

call Steve Rakitin at 508.529.4282

or e-mail him at steve@swqual.com


Home

Company Info

Contact Info


Food for Thought and Predictable Software Development are trademarks of Software Quality Consulting, Inc.
Copyright ©2008 Software Quality Consulting, Inc. All rights reserved.

Updated January 2008