An e-newsletter published by
Software Quality Consulting, Inc.

November 2005, Vol. 2 No. 10
[Text-only Version]



Welcome to Food for Thought™, an e-newsletter from Software Quality Consulting. I've created free subscriptions for my valued business contacts. If you find this newsletter informative, I encourage you to continue reading. Feel free to pass this newsletter along to colleagues by clicking this Forward Email link. If you’ve received this newsletter from a colleague and would like to subscribe, please click this Enter New Subscription link. If you don't wish to receive this newsletter, click the SafeUnSubscribe™ link at the bottom of this newsletter, and you won’t be bothered again.

Your continued feedback on this newsletter is most welcome. Please send your comments and suggestions to info@swqual.com.



 

In This Months’ Topic, I discuss software bugs...

Regular features to look for each month are:

  • Monthly Morsels
    Hints, tips, techniques and reference info related to this month’s topic
  • Calendar
    Conferences, workshops, and meetings of interest to software engineers, QA engineers and anyone interested in software development



 

A Bug’s Life

Do you know why we call them bugs? Most people are not familiar with the etymology of the term ‘bug’. The term was used much earlier than you might think. Thomas Edison used the term in a letter to an associate in 1878. He wrote:

“It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise—this thing gives out and [it is] then that ‘Bugs’—as such little faults and difficulties are called—show themselves and months of intense watching, study and labor are requisite before commercial success or failure is certainly reached.” [1]

In fact, the use of ‘bug’ to refer to a technical problem can be found in an electrical handbook from 1896, which says: "The term ‘bug’ is used to a limited extent to designate any fault or trouble in the connections or working of electric apparatus." [2]


A moth taped to Admiral Hopper’s Lab notebook, now on display at the Smithsonian

 

The term as applied to software, is often incorrectly associated with Admiral Grace Hopper. Admiral Hopper, a brilliant mathematician, worked at Harvard on the Mark II Aiken Relay Calculator – an early analog computer built from hundreds of electromechanical relays. She liked to tell a story about an event that occurred in late summer of 1947, well before the advent of air conditioning, so the windows were open most of the time. A technician solved a problem with the Mark II machine by pulling an actual insect (a moth) out from between the contacts of one of its relays. Admiral Hopper’s notebook entry from September 9, 1947 reads:

"1545 Relay #70 Panel F (moth) in relay. First actual case of bug being found". [3]

This wording establishes that the term was already in use at the time in its current sense.

Admiral Hopper’s contributions to computer science and software engineering have been extremely important. Among her significant accomplishments, she developed the first compiler as a tool to make writing programs easier.

Read more about Grace Hopper…

Some Famous Bugs…

 

Like the characters in the Disney movie, some software bugs have achieved a measure of fame related to their economic and social impact.

  • The Ariane 5 Bug

    On June 4, 1995 a launch of the European Ariane 5 expendable rocket booster resulted in the rocket breaking apart 40 seconds after liftoff. The Ariane 5 reused some guidance system software from the Ariane 4, but the Ariane 5's flight path was considerably different and beyond the range for which the reused code had been designed. The loss of the rocket and it’s payload of 4 satellites, cost about $7.5 billion. According to the review board that investigated the accident:

    “The failure of the Ariane 501 was caused by the complete loss of guidance and attitude information 37 seconds after start of the main engine ignition sequence (30 seconds after liftoff). This loss of information was due to specification and design errors in the software of the inertial reference system. The extensive reviews and tests carried out during the Ariane 5 Development Programme did not include adequate analysis and testing of the inertial reference system or of the complete flight control system, which could have detected the potential failure.” [4]

  • The NASA English/Metric System Bug

    On September 23, 1999 the Mars Climate Orbiter crashed on the surface of Mars. A NASA review board investigating the crash found that one part of the engineering team thought that calculations were in the English system units and another team thought calculations were in the metric system. Ooops! That cost us taxpayers a mere $125 million. (On the plus side though, since this incident NASA has mandated the use of Independent Verification and Validation (IV&V) on all projects that involve software).
  • The Air Traffic Control System Bug

    September 14, 2004 was not a good day to be flying in the Los Angeles area as Air Traffic Controllers lost voice contact with over 400 airplanes in the airspace around LA. The main system used to communicate with pilots (called Voice Communications Systems Unit - V CSU) failed. A backup system also failed. Pilots were essentially flying blind, not knowing what other planes were in their path. There were at least five documented cases where planes came within the minimum separation distance mandated by the FAA. Read on…

    “Inside the control system unit is a countdown timer that ticks off time in milliseconds. The VCSU uses the timer as a pulse to send out periodic queries to the VSCS. It starts out at the highest possible number that the system’s server and its software can handle—2 32. It’s a number just over 4 billion milliseconds. When the counter reaches zero, the system runs out of ticks and can no longer time itself. So it shuts down.” [5]

    In order for this system to work, it requires that a technician manually reboot the system every 30 days – talk about a Rube Goldberg…

  • The Therac-25 Accidents

    Therac-25 was a software-controlled radiation therapy machine used to treat cancer. From 1985-87, at least 5 patients died from massive overdoses of radiation from these machines. Software errors were identified as one of several contributing factors in the series of accidents involving these machines. [6] Other factors included:
    • management that was initially dismissive of early problem reports
    • poor system documentation
    • assumptions about the hardware that were not reasonable
    • lack of independent review of the code
    • code reused without considering hardware differences

The Therac-25 example is often cited as the defining event that led to increased scrutiny of software in safety-critical applications.

Read more about the Therac-25 Accidents…

  • The Patriot Missile Bug
 

On February 21, 1991, an Iraqi Scud missile hit the US Army barracks in Dharan, Saudi Arabia killing 28 soldiers. A Patriot missile battery was supposed to protect the base from incoming Scud missiles but it failed to do so in this case.

A US government investigation revealed that the failure of the Patriot missile battery was caused by a software error in the system's clock. The Patriot missile battery at Dharan had been in operation for 100 hours, after which time the clock had drifted by one third of a second, equivalent to a position error of 600 meters. The radar system detected the Scud and predicted where to look for it next, but because of the clock error, it looked in the wrong part of the sky and found nothing. With no missile to track, the initial detection was assumed to be a false alarm and the incoming missile was removed from the system. The Israelis had identified the problem and informed the US Army on February 11, 1991. The Patriot missle system manufacturer fixed the problem and supplied updated software to the Army on February 22, the day after the Scud struck Dharan. [7]

These are just a few examples of bugs that have cost huge sums of money and have adversely affected people’s lives. There are more. The impact that software bugs can have raises many questions. I’ll discuss three of these questions…

1. Where do bugs come from?

Speaking about bugs, Dr. Harlan Mills once said:

“Programs do not acquire bugs as people acquire germs, by hanging around other buggy programs. Programmers must insert them.”

Hmmm. Programmers must insert them… Well, why do they do that? One of the most common reasons is they often lack clearly written requirements. Without clear requirements, most respectable programmers will guess. Half of the time, they’ll guess right and you know what happens the other half of the time…

The bottom line - it’s important to understand how and why bugs find their way into your products. By identifying the real root cause, you will learn a lot about a bug’s life.

Read more about root cause analysis…

Root cause analysis of the bugs listed above has provided important insight into how those bugs came to be. For example, it took a combination of several “failures” to cause the accidents and resulting deaths in the Therac-25 radiation therapy machines. Understanding the root cause of problems is the first step in making sure that those problems don’t occur again.

2. Can we do a better job of preventing bugs?

Several techniques can be used to prevent bugs from being “inserted” in the first place. These techniques include:

  • Write requirements that are clear
  • Design software to be robust and fault tolerant
  • Develop and follow good coding practices

Using English to express requirements is fraught with problems because, as we all know, English is inherently vague and ambiguous. This leads to lots of guessing and you know what happens when we guess. One way to write clear requirements is to recognize the limitations of English and use alternative techniques. Alternative techniques can help reduce ambiguity by expressing requirements in a manner that leads to better understanding, a more coherent design, and thorough testing.

Some examples of alternative techniques for writing requirements include:

  • Work Flow Diagrams
  • Flowcharts
  • Structured English
  • Truth Tables

Work flow diagrams and flowcharts are excellent tools for expressing requirements in a way that leads to better understanding. I provided some examples of using Structured English in my e-newsletter on Requirements.

Here is an interesting example of comparing requirements written in English and the same requirements expressed in a truth table. Which is easier to understand, implement, and test?

Truth Tables

When to use

Use when dealing with discrete variables that are related

How to use

Create a table that lists all possible values for each variable. Each row represents combinations that should be tested…


Requirements expressed in English
(OP = Old Password NP = New Password)

User enters NP. Application determines if password meets these rules.

  • If OP is correct, NP and Confirm NP are same, pass configuration edit, and has not been used during prior two changes, confirm change with a successfully changed password message. An “OK” button brings user to Home page. Error Message = Password successfully changed.
  • If OP is correct and NP and Confirm NP match but do not conform to configuration settings, a message describing error is displayed. OP, NP and Confirm NP are blank after user selects “OK” on error message. “OK” button brings user to Change Password screen. Error Message = Password you entered does not conform to format specified by your system administrator. Enter a valid password.
  • If OP is valid and NP and Confirm NP do not match, a message describing error is displayed. OP, NP and Confirm NP are blank after user selects “OK” on error message. “OK” button brings user to Change Password screen. Error Message = NP and Confirm NP entries do not match. Try again.
  • If OP is correct and NP and Confirm NP match and pass configuration but has been used prior by user during previous two changes a message is displayed. OP, NP and Confirm NP are blank after user selects “OK” on error message. “Ok” button brings user to Change Password screen. Error Message – Password used too recently. Try again.”
  • If NP and Confirm NP match and pass configuration but OP is invalid, a message is displayed. OP, NP and Confirm NP are blank after user selects “OK” on error message. “Ok” button brings user to Change Password screen. Error Message – OP entered is invalid. Try again.


Same requirements expressed in a Truth Table
(OP = Old Password NP = New Password)

OP Confirmed

NP and Confirm NP match

Password Rules followed

NP not used in last two changes

Password change successful?

Display Message

TRUE

TRUE

TRUE

TRUE

Yes

1

TRUE

TRUE

FALSE

DON’T CARE

No

2

TRUE

FALSE

DON’T CARE

DON’T CARE

No

3

TRUE

TRUE

TRUE

FALSE

No

4

FALSE

DON’T CARE

DON’T CARE

DON’T CARE

No

5

Messages:

  1. “Password successfully changed”
  2. “The password entered does not conform to the format specified by your sys admin. Enter a valid password.”
  3. “New Password and Confirm New Password entries do not match. Try again.”
  4. “The password you entered has been used recently. Try again.”
  5. “The Old Password you entered is invalid. Try again.”

    By using alternative techniques to express requirements, the requirements become less ambiguous, more understandable and more testable. This leads to fewer bugs being inserted.

3. Can we do a better job of detecting bugs before they escape?

The third and last question I wanted to discuss is how can we do a better job of detecting bugs before we ship? The two most common techniques used are:

  • Peer Reviews
  • Testing

Three separate organizations (shown in the table below) have recognized peer reviews as a best practice.

 

Best Practices

Airlie Council [8]

SEI CMM [9]
(Level)

Best in Class Companies [10]

Risk Management

x

 

 

Requirements Definition

x

x (2)

x

Peer Reviews x x (3) x

Binary Gates at Inch-pebble Level

x

x (2)

 

Software Quality Assurance

 

x (2)

x

Defect Tracking vs. Quality Targets

x

x (4)

x

Configuration Management

x

x (2)

 

People-aware Management

x

 

 

There are generally two problems with regard to peer reviews:

  • We don’t teach people how to do them
  • We don’t do them

Ask people in your company if they have ever had any formal training in how to do peer reviews. Most people have had little or no training. And since many organizations lack good estimating and scheduling skills, projects are more often than not, constantly behind schedule. When projects are behind schedule, activities such as peer reviews are likely to get cut.

Many companies rely too heavily on testing. Most people are not aware of the limitations that are inherent in testing. For example, the input space for every non-trivial application is essentially infinite. When we write tests, we are covering an infinitesimally small percentage of that input space. This means that we must have the skills to select a very, very small set of tests that will hopefully uncover as many bugs as possible. The ability to do this is affected by several factors, including the clarity and testability of the requirements, the time allotted for testing, the test environment, the skills of the testers, etc.

And for the same reason that peer reviews are often cut (we’re behind schedule), testing is also often cut to meet deadlines. As a result, many bugs that could have been found, are not.

Summary

Software entomophobia is the fear of software bugs. Companies need to understand how bugs are introduced so effective steps can be taken to prevent and detect bugs before they affect your customers and the lives of others.

Don’t’ be a software entomophobiac.

 

Pay it Forward

If you find this newsletter of value, please consider the following:

Norm Kerth is a highly respected consultant who developed the Project Retrospective techniques discussed in the July-Aug newsletter. He was in a serious car accident and suffered a disabling brain injury. As a result, he cannot work and lives on a very limited income. You can help recognize his contribution to our industry by sending a small donation. Checks can be made payable to Norm Kerth Benefit Fund and sent to Norm Kerth Benefit Fund c/o Process Impact, 11491 SE 119th Drive, Clackamas, OR 97015-8778. You can also visit Karl Weiger’s website (Process Impact) for more details about contributing to the fund. Thanks.

Read more about the Pay It Forward foundation…


 

Every month in this space you’ll find additional information related to this month’s topic.

  • References:

    [1] Correspondence - Edison to Puskas, 13 November 1878, Edison papers, Edison National Laboratory, U.S. National Park Service, West Orange, N.J., cited in Thomas P. Hughes, American Genesis: A History of the American Genius for Invention, Penguin Books, 1989, on page 75.

    [2] Hawkins, N., Hawkin's New Catechism of Electricity, Theo. Audel & Co., 1896.

    [3] "Annals of the History of Computing", IEEE, Vol. 3, No. 3 (July 1981), pp. 285-286.

    [4] Report by the Inquiry Board, ARIANE 5 Flight 501 Failure, Paris, 19 July 1996.

    [5] Geppert, L., “Lost Radio Contact Leaves Pilots on Their Own”, IEEE Spectrum, November 2004.

    [6] Leveson, N. and Turner, C., “An Investigation of the Therac-25 Accidents”, IEEE Computer, July 1993.

    [7] Weiner, L., Digital Woes: Why We Should Not Depend On Software, Addison-Wesley, 1993.

    [8] Yourdon, E., Rise & Resurrection of the American Programmer, Prentice-Hall, 1998.

    [9] Paulk, M. C., et. al., The Capability Maturity Model: Guidelines for Improving the Software Process, Reading, MA: Addison-Wesley, 1995.

    [10] Jones, C., Software Quality: Analysis and Guidelines for Success, Boston, MA: International Thomson Computer Press, 1997.
  • Books

    One of the best books on peer reviews:

    Wiegers, K., Peer Reviews in Software: A Practical Guide, Addison-Wesley, 2002

The classic text on walkthroughs:

Freedman, D., and Weinberg, G., Handbook of Walkthroughs, Inspections, and Technical Reviews, 3rd ed., Dorset House, 1982

Collection of seminal papers on inspections:

Wheeler, D, et. al., Software Inspection: An Industry Best Practice, IEEE Computer Society Press, 1996.

Excellent book on developing software for safety-critical applications:

Leveson, N. G., Safeware: System Safety and Computers, Addison-Wesley, 1995.

My book has two chapters on inspections and several appendices with inspection information

Rakitin, S. R., Software Verification and Validation for Practitioners and Managers, 2nd ed, Artech House, 2001.



 

Every month, you’ll find news here about local and national events that are of interest to the software community …

  • Software Quality Calendar

    There are many organizations that sponsor monthly meetings, workshops, and conferences of interest to software professionals. Find out what’s happening…
  • Workshops Offered by Software Quality Consulting

    Software Quality Consulting offers workshops in many topics related to software process improvement. Get more info…


 

Software Quality Consulting provides consulting, training, and auditing services tailored to meet the specific needs of clients. We help clients fine-tune their software development processes and improve the quality of their software products. The overall goal is to help clients achieve Predictable Software Development™ – so that organizations can consistently deliver quality software with promised features in the promised timeframe.

To learn more about how we can help your organization, visit our web site or send us an email.


I hope this newsletter has been informative and helpful. Your comments and feedback are most welcome. Send me your feedback…

Thanks,


Steve Rakitin

info@swqual.com


Food for Thought, Predictable Software Development, Act Like a Customer,
and ALAC are trademarks of Software Quality Consulting, Inc.
Copyright 2005. Software Quality Consulting, Inc. All rights reserved.
Graphic design by Sage Studio