Nut

Book Review

Managing Maintenance Error : A Practical Guide
Join Now
FREE registration allows you to support this site and receive our regular M-News newsletter.

bkused120x60.gif - 3168 Bytes

Managing Maintenance Error

A Practical Guide

By: James Reason and Alan Hobbs

Hardcover: 183 pages
Dimensions (in inches): TBD
Published by: Ashgate Publishing Company
Publication Date: February 2003
ISBN: 0754615901

Contents

  • Chapter 1 - Human Performance Problems in Maintenance
  • Chapter 2 - The Human Risks
  • Chapter 3 - The Fundamentals of Human Performance
  • Chapter 4 - The Varieties of Error
  • Chapter 5 - Local Error-provoking Factors
  • Chapter 6 - Three System Failures and a Model of Organizational Accidents
  • Chapter 7 - Principles of Error Management
  • Chapter 8 - Person and Team Measures
  • Chapter 9 - Workplace and Task Measures
  • Chapter 10 - Organizational Measures
  • Chapter 11 - Safety Culture
  • Chapter 12 - Making it Happen: The Management of Error Management

Our Review

This book has the potential to revolutionise the way that we think about Failure Elimination, in the same way that John Moubray's book, Reliability Centered Maintenance, has revolutionised the way we think about Failure Prediction and Prevention. If you are at all interested in Failure Prevention within your organisation, then this is one book that you simply MUST read.

The authors of this book, Professor James Reason, and Alan Hobbs, are not engineers - rather, they are Behavioural Psychologists, and specialists in Human Performance. James Reason is Professor Emeritus of Psychology at the University of Manchester, England, and Alan Hobbs, after spending 10 years as a human performance investigator at the Australian Bureau of Air Safety Investigation, is now a Senior Research Associate at the San Jose State University Foundation at the NASA Ames Research Center, California. Yet it is clear that they have spent much of their working lives working with maintenance personnel and dealing with Maintenance issues. It is precisely because they are NOT engineers that this book is so valuable - they bring to this otherwise technical field a refreshing, practical approach that is based on sound thinking, that is, in turn, built on the solid advances that have been made in recent decades in Behavioural Psychologists' understanding of human behaviour - advances that, to date, most of we engineers have either been ignorant of, or have studiously ignored.

Why is human error important? Numerous studies have indicated that a significant proportion of equipment failures occur shortly after some maintenance action has taken place. Nowlan and Heap's ground-breaking research at United Airlines in the 1970's (which culminated in the publication of their 1978 report, and the birth of a new technique we now call Reliability Centered Maintenance) indicated that 72% of aircraft components that they examined had a higher than average probability of failure shortly after maintenance had been performed on them (however Nowlan and Heap did not exhaustively examine the reasons why this may be so). Reason and Hobbs add to the research of Nowlan and Heap by quoting statistics from the US and Japanese Nuclear Power industry, that indicate that in three out of four of those studies, maintenance errors accounted for more than 50% of the root causes of potentially serious events. They also state that it has been estimated that maintenance errors ranked second to only controlled flight into terrain accidents in causing onboard aircraft fatalities between 1982 and 1991 (despite the application of RCM techniques in the airline industry during this period). At coal-fired power stations, according to another quoted study, 56% of forced outages occur less than a week after a planned or maintenance shutdown. Maintenance errors, conclude Reason and Hobbs, with some justification, "not only endanger lives and assets, they are extremely bad for business".

This book is paradigm-shifting material. Think of the traditional engineering approach to dealing with maintenance error, and most engineers tend to think along two lines - either discipline/counsel/train the individual(s) involved, and/or write a new procedure/work instruction to make sure that it doesn't happen again. Reason and Hobbs clearly show, in this book, why neither of these approaches are likely to be successful in eliminating maintenance error. But they do suggest some more effective, practical alternatives. In fact, as the subtitle to this book ("A Practical Guide") suggests, this is a remarkably practical book aimed at maintenance professionals, as well as being thoroughly well grounded in solid applied research.

This book builds on, and expands the material first outlined in Chapter 5 of Reason's book "Managing the Risks of Organizational Accidents", and, in introducing the subject, the first chapter of Managing Maintenance Error largely reiterates the findings and conclusions of that chapter.

Chapter 2 moves on to cover some of the fundamental design attributes of human beings, and explains why maintenance activities can be particularly error-provoking. In particular, it argues the futility of trying to change the human condition, when a more effective way of managing maintenance error is to treat errors as a normal, expected, and foreseeable aspect of maintenance work, and therefore, manage maintenance error by changing the conditions under which that work is carried out.

Chapter 3 introduces non-psychologists to some of the factors that control and influence human performance. The impact of tatigue, shift cycles and rosters, personality differences, biases in decision-making and information overload on human performance are all described here. This chapter also describes task performance in terms of three performance levels:

  • Skill based - where we are so skilled at performing a (normally routine) task that we can do it with minimal levels of conscious thought about the task itself (think of getting up and getting ready for work in the morning)
  • Rule based - where we "pattern match" pre-prepared rules or solutions to trained-for problems (the traffic light is red, so I must stop)
  • Knowledge based - where we make slow, conscious attempts to solve new or novel problems (I'm stuck in this traffic jam, what is the best way out of here)

In Chapter 4, Reason and Hobbs go on to describe how the types of errors that occur, vary, depending on the nature of the task that we are performing. For example, for skill-based tasks (where we are doing the task on auto-pilot, so to speak), we are more likely to forget where we were in the sequence (especially if we get interrupted), and run the risk of repeating steps in the task (cleaning our teeth twice), or more importantly, omitting steps altogether (not cleaning our teeth at all). We are also, with these types of tasks, likely to continue on and complete the entire sequence of steps, even in (unusual) situations where this is not appropriate (driving to work on Saturday, when we meant to drive to the golf course). These they describe as skill-based errors, and there are various types of these that occur - Recognition failures, memory failures and slips.

They also describe, in this chapter, Mistakes (which they subclassify into rule-based mistakes and knowledge-based mistakes), and Violations (where someone made a conscious decision not to follow set procedures, although they (normally) did not intend that any serious consequences would result. Violations are subclassified into Routine violations, Thrill-seeking or Optimizing violations, and Situational violations).

These classifications are important in understanding the situations that led to the error arising, and also, in order to identify the most appropriate ways of dealing with the error.

Chapter 5 discusses the key factors that are known to increase the frequency of maintenance errors. Included among these are:

  • Documentation problems
  • Time pressure
  • Poor Housekeeping and Tool Control
  • Inadequate Coordination and Communication
  • Fatigue
  • Inadequate Knowledge and Experience, and
  • Problems with Procedures

It also pays particular attention to the role of beliefs in promoting violations.

Chapter 6 describes three organizational accidents in which maintenance errors had a significant contribution - an Embraer 120 crash (where the upper attachment screws on the horizontal stabilizer were removed during maintenance, and not refitted), the Clapham Junction rail collision (where a replacement signal using new technology had been installed incorrectly), and the Piper Alpha explosion (where maintenance of a pressure relief valve was at the heart of this event). It then outlines a model of accident causation that involves a cascading of contributing influences from Organizational Factors to Workplace Factors to Personal Factors, and also including Breached Defences.

Chapter 7 describes a number of fundamental principles of error management on which the remainder of the book is based. Included among these are:

  • Human error is both universal and inevitable
  • Errors are not intrinsically bad
  • You cannot change the human condition, but you can change the conditions in which humans work
  • The best people can make the worst mistakes
  • People cannot easily avoid those actions they did not intend to control
  • Errors are consequences, rather than causes
  • Many errors fall into recurrent patterns
  • Safety-significant errors can occur at all levels in the system
  • Error Management is all about managing the manageable
  • Error Management is about making good people excellent
  • There is no one best way
  • Effective Error Management aims at Continuous Reform rather than Local Fixes

Many of these principles are also embodied in the better Root Cause Analysis processes, in particular those expounded by Latino and Latino (of Reliability Center, Inc.) in their book Root Cause Analysis - Improving Performance for Bottom-Line Results, as well as those adopted by Assetivity.

Chapters 8, 9 and 10 then move on to describing the practical things that organizations can do to manage and minimize the impact of error in Maintenance at a Personal/Team level, Workplace level, and Organizational level, respectively. Various measures are described, including such things as providing awareness training for individuals and work teams, to make them more aware of the types of errors that they may be subject to make, putting in place measures to reduce the incidence of violations, Fatigue Management, Equipment Design/Maintainability, etc etc. Chapter 9, in particular has a very useful checklist that can be used to determine whether a particular maintenance task is one that is likely to be prone to error, and suggests ways of designing tasks to minimise the chances that this error will occur.Chapter 10 describes various organisational tools that can be used to systematically and proactively avoid the consequences of error, including MEDA (Maintenance Error Decision Aid) and MESH (Managing Engineering Safety Health).

Chapter 11 describes the more important attributes of a Safe Culture. There is a lot of commonality in the findings in this chapter with those of research into High Reliability Organizations (see Managing the Unexpected - Assuring High Performance in an Age of Complexity for more on High Reliability Organizations), and this research is referred to in this chapter. In particular, Reason and Hobbs argue that a Safe Culture is made up of three component parts, a Just Culture (one that has agreed and understood the difference between blame-free and culpable acts), a Learning Culture (one in which both reactive and proactive techniques are used to guide continuous and wide-reaching improvements), and a Reporting Culture (one where there is an atmosphere of trust in which people are willing to confess their errors and near-misses). It also argues that, while there is a widespread belief that values, attitudes and beliefs drive behaviour, in many cases it can also work the other way around (they argue that, for example, the reason that many people have given up smoking was not necessarily because they believed that smoking was bad for them, but because they found themselves increasingly socially marginalised because of the limitations in the places where they were permitted to indulge their habit).

Finally, in Chapter 12, the authors pull together all the information from the previous chapters, and focus on the toughest part of error management - making it happen, and keeping it going. It includes a Checklist fo Assessing Institutional Resiliance (CAIR) which can be used to assess whether your organization is likely to be more or less immune from maintenance mishaps.

In conclusion, this is a 5 star book - absolutely essential reading for anyone wishing to make significant improvements in equipment (and human) reliability and performance. Written by psychologists, for engineers, it is intensely practical, enlightening and highly valuable.


Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved.
Revised: Thursday, 08-Oct-2015 12:08:06 AEDT
Privacy Policy