Maintenance Task Selection - Part 3
Summarised by : Sandy Dunn
Webmaster, Plant Maintenance Resource Center
This is part two of a summary from the plantmaint Maintenance discussion forum which discusses alternative approaches to Maintenance task selection, including RCM, PM Optimisation (PMO), RCMCost, and others - it also touches on Total Productive Maintenance (TPM)
Go Back to Part 1 of this discussion
From: John Moubray
Much of what has been written, especially by Steve Turner and Sandy Dunn, in
response to your initial questions about RCM invites further comment and
clarification. So here goes.
1: RCM in commercial aviation
Steve Turner states that "RCM was originally developed as a design tool....
". This comment is simply not true. To place it in perspective, however, it
is necessary to review the evolution of RCM inside and outside the
commercial aviation industry.
Before doing so, it is perhaps worth noting that Mr Turner bases his
comments on his ten years of experience in RCM and on his copy of Nowlan &
Heap's report. For the record, I also possess a copy of Nowlan & Heap's
report. In addition, Stan Nowlan himself was my personal mentor in this
field, from 1981 until his death in 1995. I am also currently serving on a
working group commissioned by the US Air Transport Association to review
MSG3, which is the name of the process used by the airlines to develop
maintenance programs for commercial aircraft. The comments that follow
reflect this experience.
RCM finds its roots in the early 1960's. The initial development work was
done by the North American civil aviation industry. The airlines at that
time began to realise that many of their maintenance philosophies were not
only too expensive but also actively dangerous. This realisation prompted
the industry to put together a series of "Maintenance Steering Groups" to
re-examine everything they were doing to keep their aircraft airborne. These
groups consisted of representatives of the aircraft manufacturers, the
airlines and the FAA.
The first attempt at a rational, zero-based process for formulating
maintenance strategies was promulgated by the Air Transport Association in
Washington DC in 1968. The first attempt is now known as MSG 1 (from the
first letters of Maintenance Steering Group). A refinement - now known as
MSG 2 - was promulgated in 1970.
In the mid-1970's the US Department of Defence wanted to know more about the
then state of the art in aviation maintenance thinking. They commissioned a
report on the subject from the aviation industry. As mentioned by Ron
Doucet, this report was written by Stanley Nowlan and Howard Heap of United
Airlines. They gave it the title "Reliability Centered Maintenance". The
report was published in 1978, and I agree with Steve Turner's comment that
it is still one of the most important documents - if not the most
important - in the history of physical asset management. It is available
from the US Government National Technical Information Service, Springfield,
Nowlan & Heap's report represented a considerable advance on MSG 2 thinking.
It was used as a basis for MSG 3, which was promulgated in 1980. MSG 3 has
since been revised twice. Revision 1 was issued in 1988 and revision 2 in
1993. It is used to this day to develop prior-to-service maintenance
programs for new aircraft types (recently including the Boeing 777 and
Copies of MSG 3 revision 2 are available from the Air Transport Association,
Several points to note from this history:
- the term "reliability centered maintenance" is not used in commercial
aviation. The term does not even appear in the MSG3 document. In fact, there
are some fundamental differences between RCM (both as it is described in the
Nowlan & Heap report and as it is described in SAE JA1011) and MSG3. Many of
these differences find their roots in assumptions made about the training
and skills of the maintenance technicians found in commercial aviation and
in other industries. (In general, the former are much more highly trained
and their training is focused - far more so than usual - on the needs of a
specific industry. As a result, many of the tasks in maintenance programs
developed using MSG3 are described and grouped in ways that have precisely
defined meanings within commercial aviation but that are meaningless outside
- the Nowlan & Heap report was commissioned by the US Department of Defence.
Consequently, as the authors well knew at the time, it was written
specifically for use by people outside the commercial aviation industry
despite the differences between MSG3 and RCM, both are very much tools
used for the development of maintenance programs, not design tools. For
instance, the first words in the preface to Nowlan & Heap's report are "This
volume provides the first full discussion of reliability-centered
maintenance as a logical discipline for the development of scheduled
maintenance programs." The present title of the MSG3 document is
"Maintenance Program Development Document (MSG-3) Revision 2". (My italics.)
The question of design only enters into the application of both processes
when it is apparent that maintenance cannot provide a satisfactory way of
dealing with failure modes identified during the FMEA.
2: Maintenance strategy formulation in the US nuclear power industry.
One justification put forward by Steve Turner (and others) for using
retroactive or reverse RCM approaches like PMO 2000 is that they have been
used in the US nuclear power industry. I personally have not applied RCM in
a US nuclear power station (although our network has assisted nuclear
facilities with the application of RCM II in other countries.) Consequently,
I cannot comment personally on the application of RCM in the US nuclear
environment. However, I am able to draw your attention to comments made by
Dr David Worledge, who perhaps knows more than anyone else about the
application of RCM to US nuclear power stations. Dr Worledge worked for the
Electric Power Research Institute (EPRI) from 1981 to 1995, and headed the
initial pilot applications of RCM in US nuclear power stations from 1982 to
1985, and their subsequent development. (He now works as an independent
On 24 and 25 August 1999, an RCM conference was held in Denver, Colorado. It
was organised by Electric Utility Consultants Inc and was aimed exclusively
at the electricity transmission and distribution sectors. The speakers were
a whole variety of RCM consultants and end-users, myself included. I made a
few comments about RCM in the US nuclear power industry during a
presentation on the history of RCM. After this presentation, Dr Worledge
stood up to make some additional comments from the floor.
The gist of his comments was as follows: The initial maintenance programs in
US nuclear power plants were developed in conventional fashion, relying
heavily on vendor recommendations. Continuing efforts to enhance safety and
reliability, and ever increasing regulatory requirements resulted in utility
management at some plants questioning whether the overall result was a
significant degree of over-maintenance. By the early 1980's, the nuclear
power industry often seemed to be faced with a choice of either generating
power or doing the prescribed PM. They had to find a way of reducing the PM
workloads quickly without prejudicing safety or reliability.
EPRI became aware of the Nowlan & Heap report entitled "Reliability-centered
Maintenance", which was published in 1978. This seemed to offer a solution
to their problem. However, after initial applications of "classical RCM" by
EPRI, many plants developed their own methods for maintenance optimization,
some of which departed from RCM principles. Dr Worledge stressed to bring
some order to this situation it became EPRI's objective to reduce PM
workloads using standardized but "streamlined" approaches which took
advantage of certain features of the design of nuclear power plants, but
which kept close to the philosophy of classical RCM. They took the view that
high levels of redundancy in their safety systems, high levels of
regulator-imposed failure-finding tasks, and the fairly simple mission of
the power generating systems at such plants could validly support certain
simplifications of the methodology. They also took the view that at least in
older plants the existing operating experience had encountered all
reasonably likely failure modes, further supplemented in some cases by
comprehensive risk assessments and very detailed record keeping carried out
by the nuclear power industry itself. In addition, each plant already had a
detailed system functional review performed in its Final Safety Analysis
Report, as part of obtaining its operating license. Consequently, they felt
that the function analysis and the FMEA steps embodied in the RCM process
could be simplified.
A further notable aspect of their situation was that in most plants all the
key protective devices and safety systems used in nuclear power stations
were already covered by fairly comprehensive maintenance programs. As a
result, there was often more interest in removing superfluous maintenance
activities, which in some cases were actually damaging to reliability and
availability of safety systems. However, there was still a drive to improve
reliability in power generating systems because the industry was in need of
increasing plant capacity factors.
The most abbreviated of three streamlined approaches, (recommended by EPRI
in EPRI TR-105365, September 1995), modified the RCM process by setting up a
list of simple functional questions such as "does this component failure
lead to a plant trip, or to a power reduction of >5%, the loss of a safety
function, to a plant transient, or a personnel hazard, or a delay in
start-up", etc, without further functional analysis. Two additional
streamlined approaches in the EPRI report closely resembled classical RCM
with some liberties taken over documentation, and the early separation of
clearly less important components.
A further approach which some have described as "reverse RCM" where existing
PM tasks are simply re-examined as to their utility and cost effectiveness,
was sanctioned by EPRI only in urgent situations (under the name Outage
Management Assessment) to try to reduce the pressing workload for an
upcoming, already scheduled, refueling outage. Reverse RCM was never
recommended by EPRI for general use, and did not form part of its
recommended streamlined approaches.
Dr Worledge concluded his remarks by saying that in his opinion, these
processes achieved their limited objectives in the nuclear power industry,
in that they led to very substantial reductions in PM workloads without
appearing to prejudice safety or reliability. However, he then went on to
express the opinion that caution should be exercised when a process
developed to solve a very specific set of problems in the unique environment
of the US nuclear power industry is proposed for use in other industries -
such as oil and gas, thermal power generation and electricity T&D - where
the same initial conditions may not apply.
3: The SAE RCM standard
Since the Nowlan & Heap report was published, a great many processes have
emerged that claim to be RCM. Many of them bear little or no resemblance to
the process described by Nowlan & Heap. This became a cause of grave concern
to many organisations. In particular, the US Naval Air Command (Navair),
which was one of the sponsors of the original N&H report, found that some
vendors were using all sorts of weird and wonderful processes which they
described as "RCM" to develop maintenance programs for equipment that they
were selling to Navair. (The history of RCM in the US military has been ably
described by Dana Netherton, chairman of the SAE RCM committee, in articles
that appeared in maintenance journals in Australia, the USA and the UK.)
These aberrant RCM processes led Navair to approach the SAE - as a
recognised standards-setting institution with close ties both to the US
Military and to the aerospace sector - for help with the development of a
standard that could be used to define what is and what is not RCM. This
standard (SAE JA1011) was published in August 1999 and can be obtained from
the SAE at www.sae.org.
The standard is important because of a tendency for vendors of strategy
formulation processes other than RCM to compare their processes with RCM,
but without specifying which version of RCM. In particular, beware of
comparisons to something called "Classical" RCM. Nowhere in the literature
on this subject have I encountered a description of a process which is
specifically labelled "Classical RCM", so it seems to be a convenient
mirage. In some of the cases where the term has been used, it seems to refer
to an horrendously complicated variant of the process which not only calls
for the analysis to be carried out at far too low a level in the equipment
hierarchy, but also requires users to prepare complex (and usually
unnecessary) functional block diagrams before starting the analysis. Almost
any analytical process is likely to be an order of magnitude quicker than
All this means that when asked to compare any non-standard version of RCM
with RCM, care needs to be taken to establish whether the comparison is
being made with a version of RCM that complies with the SAE standard.
4: RCM in industries other than aviation and nuclear power
Sandy Dunn states that "I have recently come to the conclusion that, in
contrast to the position that is put forward by John Moubray, Ron Doucet and
others from the RCM II religion, the major problem is that RCM II (and by
definition RCM) has NOT been sufficiently adapted to meet the needs of
industry outside the airline industry."
Firstly, as discussed in section 2 above, RCM is not used by the airline
industry. Secondly, as discussed in section 3 above, the SAE RCM Standard
(not RCM II) defines what RCM is. (From now on, unless stated otherwise,
when I use the term RCM, I will be referring to any process - of which RCM
II is one - that complies fully with the SAE Standard.)
These two points apart, my view of the applicability of RCM is very
different. Together with Aladon's network of licensees, I have been directly
and indirectly involved with the application of RCM II on more than 1200
industrial sites spanning 42 countries. These applications have embodied the
performance of several thousand RCM analyses.
It is true to say that the application of RCM has not been successful in
every case. It can be said to have failed in about one third of the
orgainsations where it has been tried, either because the organisations
concerned did not derive the benefits that they hoped to from the RCM
process or the RCM initiative collapsed before it could yield much in the
way of significant results. In our experience, none of the initiatives that
failed did so for technical reasons. Without exception, the initiatives that
failed did so for organisational reasons. Of these, the two most common
reasons for failure are:
- the principal internal sponsor of the initiative quit the organisation or
moved to a different position before the new ways of thinking embodied in
the RCM process could be institutionalised
- the internal sponsor and/or the consultant who was acting as the change
agent could not generate sufficient enthusiasm for the process for it to be
applied in a way which would yield results.
Of course, if one third of these applications have failed, then two thirds
have been successful. This success rate is at least as good as, if not
better than, the success rate achieved by major organisational change
initiatives in general.
At this point, it is worth noting Sandy Dunn's observation that his
experience in Australia was that "for every ten organisations that started
to implement RCM II, only one ever implemented the process on anything other
than a "pilot project" scale". He also states that these failures were not
"due to any failure on the part of the consultant concerned." Firstly, my
records indicate that his personal experience of RCM II is indeed true. Only
about one tenth of the RCM II projects with which he personally was
associated went beyond the pilot stage. However, a great many other RCM
practitioners have been active in Australia for the past ten years, and
their collective experience is that about two thirds of the applications of
RCM to date have progressed well beyond the pilot stage (not 1 in 10). This
sharp contrast bears out my own experience gained from working with some 200
licensed RCM II practitioners worldwide over a period of 15 years - of whom
about 120 are currently active: there is in fact a high correlation between
the success rate of RCM II applications and the change management
capabilities of the consultants involved. (Among others, the British Royal
Navy, which is a major user of SAE-compliant RCM, has come to understand
that the capabilities of individual consultants are every bit as important
as the track record of their employers. So much so that the RN now insists
on interviewing at great length every RCM consultant that is to be placed at
their disposal, in addition to verifying the commercial bona fides of their
When discussing the "success" of RCM, we need to look at both the economic
benefits and the question of risk.
5a: The economic benefits of RCM
Kim, you are absolutely correct to observe that "... a lot of maintenance
decision makers I have met look mainly at the tangible returns (minimum
cost, minimum project duration) rather than the projected expected returns
of carrying out RCM." In fact, if RCM is correctly applied by properly
trained people working under the direction of a skilled facilitator, and the
project has been properly planned before it starts, it usually pays for
itself in between two weeks and two months. (In some cases, the payback
period has been measured in days and sometimes one or two years, but the
norm is weeks to months.) This is a very rapid payback indeed.
In nearly every case, these economic benefits flow from improved plant
performance rather than reductions in the direct cost of maintenance
(although very substantial reductions in direct maintenance costs have been
achieved in some cases, especially by military users). From the economic
point of view, improved plant performance can manifest itself in a variety
of ways, such as an increase in total throughput, a reduction in failure
rates, an increase in plant availability or a reduction in scrap rates. Some
examples are as follows:
- a small dairy products factory in Scotland: a 20% increase in total
throughput. This increased the contribution to group profits at plant level
by £1 million per annum, while the total cost of the project (including the
cost of the manhours spent undergoing training and attending review group
meetings) was less than £200 000. The analysis of this entire plant was
completed in three months
- a plant manufacturing steel wheels for automobiles in England:
productivity increased from 35 wheels per man per shift to 105 wheels per
man per shift in the space of six months (same machines, same people)
- a paper mill in Pennsylvania: a complex new boiler control system failed
five times in three years, shutting off steam (and co-generated electricity
supplies) to the paper mill, causing a complete mill shutdown. The total
cost of these failures was US$11 million, and the company had been unable to
solve the problems using conventional problem-solving techniques. They then
applied RCM. The project took six months, the RCM-derived recommendations
were implemented and there were no further failures in the ensuing three
years.The total cost of performing the RCM analysis and implementing the
proposed remedies was US$200 000. A $11m saving for an outlay of $200k
amounts to a payback of about one week
- an iron ore mine in Canada: at a recent conference in Toronto, Ron Doucet
cited the case of an RCM analysis of an ore crusher. The analysis cost
CDN$80 000 and led to an increase in throughput that was worth a nett $4.8
million per annum. A payback of less than a week.
- a microelectronics assembly plant in Malta: scrap rates on one production
line were reduced from 4% to 50 parts per million - an 80-fold reduction -
in the space of six weeks. Payback in this case was measured in weeks.
I could cite a great many other cases. (Sandy Dunn says that RCM "... is
complete overkill in most situations in most industries." If results like
these are overkill, then long live overkill.)
Both Steve Turner and Sandy Dunn also state that RCM is only worth applying
in "high-risk" industries such as petrochemicals and oil & gas. Steve Turner
goes further, by suggesting that it is a waste of time to apply RCM to
mature plants. Suffice it to say that none of the above examples are from
"high risk" petrochemical-type industries, and all the plants concerned had
been in service for at least three years, and in some cases much longer.
Cost-effectiveness apart, another comment frequently made about true RCM is
that "it takes too long". For instance, at one point, Steve Turner writes:
"If you use PMO2000 you will have these (hazardous problems) under control
in one year, if you use traditional RCM it will take you six." This implies
that it would take six years to analyse all the equipment in a major
facility using true RCM. Suffice it to say that the world's largest coal
fired power station used RCM II to analyse all 65 of its major systems in a
period of 18 months, without losing a microwatt of generating capacity due
to the analysis and in circumstances where it was as difficult for them to
commit key resources to this process as it has been anywhere else in the
world. (Or I could cite the case of the two Malaysian CCGT power stations
that also analysed all their equipment in 18 months, or the large UK candy
factory - employing 55 maintenance craftsmen - that did likewise. And so on
5b: RCM and risk
Everyone who has commented on RCM seems to agree that it is a good tool for
developing maintenance programs in "high risk" situations. Sandy Dunn is
also correct when he says "I have heard it said (even by John Moubray
himself) that one cannot justify applying RCM to all equipment items - some
equipment items have such low impact on business risk that the effort
required to perform RCM analysis on them is greater than the potential
benefits." However, as those who have heard me speaking at conferences in
the recent past will be aware, my position on this subject is changing.
I am increasingly coming around to the view that no physical asset or system
can be deemed to be "low risk" unless it has been subjected at the very
least to a zero-based FMECA (and preferably a full RCM review) that proves
beyond a reasonable doubt that it is in fact low risk. There are two reasons
why my viewpoint has changed.
The first reason is actually a combination of factors: feedback from our
network concerning the results of the thousands of RCM II analyses that are
being performed around the world, and incidents in supposedly "low risk"
industries that have had very grave business implications.
The feedback from our network speaks of case after case of supposedly
innocuous systems that turn out to embody very surprising and potentially
deadly failure modes. In our experience, on average about 4% (1 in 25)
failure modes are deemed to have direct safety or environmental
implications. We also frequently find that as many as 25% of failure modes
have potentially hazardous consequences but are not currently receiving any
form of PM. Most of the latter failure modes deal with protective devices
that have not been receiving attention any sort of attention prior to the
RCM II analysis. This issue is discussed further later.
(These data differ totally from those put forward by Steve Turner when he
says "Further to this, in my ten or so years of facilitating RCM analysis, I
have put about 1 in every 200 failures in the hazard category. Of these,
only once have I ever felt the RCM team had uncovered a potential hazard
that was not receiving any PM. My rough calculations tell me that the
benefit of RCM over PMO2000 is the one new hazard found in 15,000 failure
modes." It is also worth noting that although Steve Turner did attend
Aladon's RCM II practitioners' course, he always seems to compare PMO 2000
with one of the forms of "Classical" RCM.)
What about the supposedly "low risk" industries? Two sectors that are
frequently said to be "low risk" - and hence not worth rigorous analysis -
are automobile factories and food plants. In fact, simply reading the
newspapers shows how inappropriate it is to dismiss either of these
industries as low risk, as the following examples indicate:
- the boiler that blew up (during a maintenance inspection) at Ford's River
Rouge plant in Detroit in February 1999, killing six people and shutting the
plant down for 1.5 weeks. A huge business risk.
- the failure of the Firestone tyres on Ford Explorers which has been partly
attributed to the design of the tyres, partly (and arguably) to the
pressures at which the tyres were operated and partly (mainly?) to failures
(failure modes) in the manufacturing process used to produce the tyres in
one plant. These failures pose a serious threat to the continued existence
of Firestone as a company - perhaps the ulimate business risk
- the failure of a filter used in the Perrier water bottling plant in
France, leading to the recall of hundreds of thousands of bottles of Perrier
water at enormous cost to the company
- the contamination (another failure mode) of pallets used by Coca Cola in
Belgium, leading again to a massive and very expensive product recall, in
addition to seriously damaging the reputation of the company in Europe.
Note that all these failures involve the failure of physical assets. In the
case of the Coca Cola plant, it was pallets, which are just the sort of
simple, massively redundant items that are likely to be dismissed as
"non-critical" (until after the event).
The second reason why my views on criticality are changing concerns the
legislative environment in which more and more users of physical assets are
operating. The reaction of society as a whole to equipment failures is
changing at warp speed as we move into the 21st century. The changes began
with sweeping legislation governing industrial safety, mainly in the 1970's.
Among the best known examples of such legislation are the Occupational
Safety and Health Act of 1970 in the United States and the Health and Safety
at Work Act of 1974 in the United Kingdom. These Acts are fairly general in
nature, and similar laws have been passed in nearly all the major
industrialised countries. Their intent is to ensure that employers provide a
generally safe working environment for their employees.
These Acts were followed by a second wave of more specific safety-oriented
laws and regulations such as OSHA Regulation Nş 1910.119: "Process Safety
Management of Highly Hazardous Chemicals" in the United States and the
"Control of Substances Hazardous to Health Regulations" in the United
Kingdom. Both of these regulations were first promulgated in the early to
mid-1990's. They are noteworthy examples of a then-new requirement for the
users of hazardous materials to perform formal analyses or assessments of
the associated systems, and to document the analyses for subsequent
inspection if necessary by regulators.
These two sets of developments represent a steady increase in legal
requirements to exercise - and to be able to demonstrate that we are
exercising - responsible custodianship of the assets under our control. They
have placed a significant burden on the managers of the assets concerned.
However, they reflect the rising expectations of society in terms of
industrial safety, and we have no choice but to comply as best we can.
It would be nice if it all ended there, but unfortunately this tide has not
stopped rising. The late 1990's have seen even more changes, this time
concerning the sanctions that society now wishes to impose if things go
wrong. Until the mid-90's, if a failure occurred whose consequences were
serious enough to warrant criminal proceedings, the proceedings usually
ended at worst with a substantial fine imposed on the organisation found to
be at fault, and the matter - at least from the criminal point of view -
usually ended there. (Occasionally, the organisation's permit to operate was
withdrawn, as in the case of the ValuJet airline after the crash in Florida
on 11 May 1996. This effectively put the airline out of business in its
However, following recent disasters, a movement is now developing not only
to punish the organisations concerned, but also to impose criminal sanctions
on individual managers. In other words, under certain circumstances,
individual managers can be sent to prison in connection with equipment
failures that have sufficiently nasty consequences. Stephen Young has
mentioned the pending legislation in the States of Victoria and Queensland
in Australia, which propose custodial sentences not only for specific
individuals, but for whole teams of people. Ron Doucet also mentioned the
changes to the Evidence Act in Victoria.
Legislative developments of this sort have not only taken place in
Australia. For instance, in the United Kingdom, John Prescott, the Minister
of Transport, has stated that in the light of the official inquiry into the
Paddington rail crash that occurred in 1999, he will introduce a law for a
crime to be called 'corporate killing', part of which will entail prison
sentences for specific executives. In the United States, following the
outcry about the accidents involving tire tread separation on SUV's, section
30170 of the "Motor Vehicle and Motor Vehicle Defect Notification Act" was
revised in October 2000 to include prison sentences of up to 15 years for
"directors, officers or agents" of vehicle manufacturers who commit
specified offences in connection with vehicles that fail in a way that
causes death or bodily injury.
There is considerable controversy about the reasonableness of these
initiatives, and even some doubt about their ultimate enforceability.
However, from the point of view of people involved in the management of
physical assets, the issue is not what is reasonable, but that we are
increasingly being held personally accountable for actions that we take on
behalf of our employers. Not only that, but if we are called to account in
the event of a serious incident, it will be in circumstances that could
culminate in jail sentences.
(Kim, in this context, you were actually not joking when you wrote "With all
this talk of litigation it's amazing we don't have the company legal eagles
doing reviews of their equipment strategies". I know of at least one major
petrochemical company that requires all FMEA's to be reviewed by the
company's lawyers before they are signed off.)
The message to us all is that society is getting so sick of industrial
accidents with serious consequences that not only is it seeking to call
individuals as well as corporations to account, but (in the case of the
Victoria Evidence Act) that it is prepared to alter well-established
principles of jurisprudence to do so. Under these circumstances, everyone
involved in the management of physical assets needs to take greater care
than ever to ensure that every step they take in executing their official
duties is beyond reproach. It is becoming professionally suicidal to do
6: Planned Maintenance Optimisation
As explained by Steve Turner, PMO starts not by defining the functions of
the asset (as specified in the SAE RCM Standard), but starts with the
existing maintenance tasks. Users of this approach are then asked to try to
identify the failure mode that each task is supposed to be preventing, and
then work forward again through the last three steps of the RCM decision
process to re-examine the consequences of each failure and (hopefully) to
identify a more cost-effective failure management policy. (This approach is
what is most often meant when the term 'streamlined RCM' is used. It is also
known as "backfit" RCM or "RCM in reverse".)
These retroactive approaches are superficially very appealing, so much so
that I tried them myself on numerous occasions when I was new to RCM.
However, in reality they are also among the most dangerous of the
streamlined methodologies, for the following reasons:
- they assume that existing maintenance programs cover just about all the
failure modes that are reasonably likely to require some sort of preventive
maintenance. In the case of every maintenance program that I have
encountered to date, this assumption is simply not valid. If RCM is applied
correctly, it transpires that nowhere near all of the failure modes that
actually require PM are covered by existing maintenance tasks. As a result,
a considerable number of tasks have to be added. Most of the tasks that are
added apply to protective devices, as discussed below. (Other tasks are
eliminated because they are found to be unnecessary, or the type of task is
changed, or the frequency is changed. The nett effect is usually a reduction
in perceived PM workloads, typically by between 40% and 70%.)
- when applying retroactive RCM, it is often very difficult to identify
exactly what failure cause motivated the selection of a particular task, so
much so that either inordinate amounts of time are wasted trying to
establish the real connection, or sweeping assumptions are made that very
often prove to be wrong. These two problems alone make this approach an
extremely shaky foundation upon which to build a maintenance program.
- in re-assessing the consequences of each failure mode, it is still
necessary to ask whether "the loss of function caused by the failure mode
will become evident to the operating crew under normal circumstances". This
question can only be answered by establishing what function is actually lost
when the failure occurs. This in turn means that the people doing the
analysis have to start identifying functions anyway, but they are now trying
to do so on an ad hoc basis halfway through the analysis. If they do not,
they start making even more sweeping - and hence often incorrect -
assumptions that add to the shakiness of the results.
- retroactive approaches are particularly weak on specifying appropriate
maintenance for protective devices. As stated on page 172 of the second
edition of my book on RCM: "at the time of writing, many existing
maintenance programs provide for fewer than one third of protective devices
to receive any attention at all (and then usually at inappropriate
intervals). The people who operate and maintain the plant covered by these
programs are aware that another third of these devices exist but pay them no
attention, while it is not unusual to find that no-one even knows that the
final third exist. This lack of awareness and attention means that most of
the protective devices in industry - our last line of protection when things
go wrong - are maintained poorly or not at all." So if one uses a
retroactive approach to RCM, in most cases a great many protective devices
will continue to receive no attention in the future because no tasks were
specified for them in the past. Given the enormity of the risks associated
with unmaintained protective devices, this weakness of retroactive RCM alone
makes it in my opinion completely indefensible. (Some variants of the
retroactive approach - such as S-RCM - try to get around this problem by
specifying that protective systems should be analysed separately, often
outside the RCM framework. This gives rise to the absurd situation that two
analytical processes have to be applied in order to compensate for the
deficiencies created by attempts to streamline one of them)
- more so than any of the other streamlined versions of RCM, retroactive
approaches focus on maintenance workload reduction rather than plant
performance improvement (which is the primary goal of function-oriented true
RCM). Since the returns generated by using RCM purely as a tool to reduce
maintenance costs are usually lower - sometimes one or two orders of
magnitude lower - than the returns generated by using it to improve
reliability, the use of the ostensibly cheaper retroactive approach becomes
self defeating on economic grounds, in that it virtually guarantees much
lower returns than true RCM.
In nearly all cases, the proponents of the retroactive approaches to RCM
claim that these approaches can produce much the same results as true RCM in
much less time. (Steve Turner claims that PMO is six times quicker, although
he compares PMO with "Classical" RCM, not RCM II.) However, the above
discussion indicates that not only do they produce nothing like the same
results as true RCM, but that they contain logical or procedural flaws which
increase risk to an extent that overwhelms any small advantage they might
offer in reduced application costs. It also transpires that if one seeks to
avoid making some of the more gratuitous assumptions required by retroactive
techniques, they actually end up taking longer and costing more to apply
than true RCM, so even this small advantage is lost. As a result, the
business case for applying retroactive RCM is suspect at best.
However, a rather more serious point needs to be borne in mind when
considering these techniques. The very word 'streamline' suggests that
something is being omitted.(For instance, Steve Turner states that PMO
usually omits the function identifcation step, and that as a result, it only
identifies half of the reasonably likely failure modes that would be
identified using even 'Classical' RCM.) In other words, there is to a
greater or lesser extent a degree of sub-optimisation embodied in all of
Leaving things out inevitably increases risk. More specifically, it
increases the probability that an unanticipated failure, possibly one with
very serious consequences, could occur. If this does happen, as suggested
above, managers of the organisation involved are increasingly likely to find
themselves called personally to account. If the worst comes to the worst,
they will not only have to explain, often in an emotionally-charged
courtroom confronted by bitterly hostile legal Rottweilers, what went wrong
and why. They will also have to explain why they deliberately chose a
sub-optimal decision-making process to establish their asset management
strategies in the first place, rather than using one which complies fully
with a Standard set by an internationally-recognised standards-setting
organisation. It would not be me that they would have to convince, not their
peers and not their managers, but a judge and jury.
One rationale often advanced for using the streamlined methods is that it is
better to do something than to do nothing. However, this rationale misses
the point that all the analytical processes described above, retroactive or
otherwise, require their users to document the analyses. This means that a
clear audit trail exists showing all the key information and decisions
underlying the asset management strategy, in most cases where no such
documentation has existed before. If a sub-optimal approach is used to
formulate these strategies, the existence of written records makes every
shortcut much clearer to any investigators than they would otherwise have
been. (This in turn may suggest that perhaps we should simply forget about
all of these formal analytical processes. Unfortunately, the demand for
documented analyses embodied in the second wave of safety legislation
mentioned above does not allow us this option.)
A further rationale for streamlining says something like "we have been using
this approach for a few years now and we haven't had any accidents, so it
must be all right." This rationale betrays a complete misunderstanding of
the basic principles of risk. Specifically, no analytical methodology can
completely eliminate risk. However, the difference between using a more
rigorous methodology and a less rigorous methodology may be the difference
between a probability of a catastrophic event of one in a million versus one
in ten thousand. In both cases, the event may happen next year or it may not
happen for thousands of years, but in the second case, it is a hundred times
more likely. If such an event were to happen, the user of a form of RCM that
complies with the SAE Standard would be able to claim that he or she
exercised prudent, responsible custodianship by applying a rigorous process
that complies with an internationally recognised standard, and as such would
be in a highly defensible position. Under the same circumstances, the user
of any "downsized" and hence non-compliant form of RCM is on much, much
It is interesting to note that all but one of the people who have chosen to
comment at length in this discussion (myself included) are consultants.
Consultants of course have commercial axes to grind, which will lead many
readers to say "well, he would say that, wouldn't he." This leads me to make
two final suggestions in closing:
- take special note of the views of the one commentator who is a practising
maintenance manager and hence who does not have a commercial axe to grind,
but who feels strongly enough about all this stuff - based on personal
experience - to share his thoughts at length (Ron Doucet of the Iron Ore
Company of Canada), and
- if you really want to satisfy yourself about the relative merits of each
approach, try them both on a pilot scale, preferably on the same type of
equipment. Look at the outcomes in terms of documented maintenance programs
(with a special eye on defensibility) and in terms of benefits achieved
related to costs. Then make up your own mind.
With best wishes for the festive season
From: Andrew Jardine
I was very pleased to read the RCM overview by John Moubray. It
certainly helped me to put much of the recent correspondence in perspective.
From Peter Ball
When I casually suggested to Stephen recently "Grasp the subject - the words will follow" little did I envisage the 6912 word explanation from JM himself, in defence of his RCM.
There seems to be no end of grief outpouring concerning this Classical Vs Streamlined RCM which now incorporates Conventional PMO, Reverse RCM, and "RETROACTIVE RCM" even. The big funny of it all seems to be the ordained role of SAE to provide a Standard (JA1011) to prop-up "Classical RCM". Ho Ho Ho!
Streamlined RCM appears to be considered as an anathema.
My view, for what it is worth is that there is no BIG DEAL here.
Put to one side all of these "versions" of the methodology, and get back to BASICS.
Consider basic reliability centred maintenance. All that is required is a sensible mechanical (or electrical) engineer with access to the plant asset register, and the accounts department.
The tools needed include:
- FMECA (Failure Mode Effects and Criticality Analysis),
- FTA (Fault Tree Analysis),
- Pareto Analysis,
- Block Diagrams,
- Weibull Analysis, and
- an understanding of Risk / Cost Management.
No need for RCM trained teams and in-house facilitators; or software that comes only with the training.
Do not overlook the human aspects of reliability; Good Management providing Good working environment, usually results in Good Reliability.
In the mid to late 80's I introduced 'basic' reliability centred maintenance into the Australian Uranium mining industry. The word Reliability up until then was a management consideration of employee performance. Using the above 'tools' I developed the appropriate maintenance strategies with the end result that the insurance cover was withdrawn from the underwriters, the first year rewards were published as in excess of $AU1million, and things stopped breaking-down.
Happy 2001 to Everyone.
From Trevor Hislop
Three cheers for common "basic" sense from Peter !
From Dana Netherton
I sympathize with the sentiments ... but sentiments are no
substitute for judgement, especially when valuable and dangerous
physical assets are at stake.
I know I'm new to this forum, so a word of introduction may be in
order. I'm Dana Netherton, the chair of the SAE subcommittee that
wrote the RCM standard. I understand that Peter (Ball) has "used
my name in vain" a few times in the past. I finally thought I would
take a look for myself. :-)
In the interests of fair play, I should say a little about my
background and setting. I started my working life in US Navy
nuclear submarines (naval officer). After I left the Navy and finished
some other academic studies, I went to work for an American
consulting firm with US Navy contracts. Back in the late 1970s,
that firm introduced the US Navy to RCM, and so a few years after I
joined the firm, about 12 years ago, I began working in the field of
maintenance management consulting and RCM. Most of my SAE
committee's substantive work on the RCM standard was done
while I worked for that firm. My role on the committee was to
protect the interests of our US Navy client.
In recent years, that company has become enamoured of a
process similar to the "reverse RCM" that John Moubray described
in his very thorough posting of a few days ago. My work on the
SAE committee showed me wider horizons, and gave me a broader
perspective, than I had had while ensconced in my consultant's
In that committee, I met US Navy aviation people with experience
in RCM (unconnected with my employer, or with Aladon) who took
one look at "reverse RCM" and recoiled -- then returned to make
biting and unanswerable comments about it. I met commercial
people with experience in RCM -- in the steel industry, in the
chemical processing industry -- who had the same reaction and the
same comments. After some very serious soul-searching, I finally
decided that I could not continue to support my employer's efforts
to encourage people to use "reverse RCM" -- and also retain my
sense of professional integrity.
So I left them, about a year and a half ago, once the SAE RCM
standard was largely put to bed and our Navy client's interests
were protected. I now own my own small consulting firm, Athos
Corporation, which is a member of the Aladon Network. (BTW, my
Aladon license is restricted to North America, so I have no
commercial interest in Australia.) As I had expected, and as I am
sure that people here can appreciate, starting a new business from
scratch is no gold mine by any means; but I go to bed at night with
a clear conscience.
Now then. I'd like to say two things about the SAE standard.
First, I'd like to address the reasons why someone might like to
use it. Then, I'd like to address what questions it does *not*
answer -- because there are some important questions that it was
deliberately intended to sidestep.
1. Why use the SAE standard?
As people can appreciate, I'm sure, I have spoken to a lot of people
about the SAE standard over the past several years -- in many
cases from conference platforms (in the USA). I have seen a few
complaints about it, in the year or so since it was published in Oct
So far, every complaint has come from a consultant.
So far, every comment I have seen from a user has ranged from
favorable to devoutely grateful.
Because the standard is not intended to meet the needs of
consultants. It is intended to meet the needs of users.
Consultants need to establish the credibility of their process to
their prospective clients. If this means attaching a recognized TLA
(three-letter acronym) to whatever the heck it is that they do, then
hooray, go for it! (All too often.)
Users need to know what sort of pig is inside this poke that has
this label on it, this TLA that seems to say that the pig is such-and-
so. When they buy a poke with *this* label on it, users need to be
confident that they are getting *this* sort of pig inside it. (This is
especially tricky when the users are not yet experts in the process
they are about to embark on.)
(Is everyone in this international forum familiar with the slang
phrase, "buying a pig in a poke"? In the US, at least, this means
"buying something sight unseen" -- something that is *said* to be a
pig, in an unopened sack or bag (an unopened "poke"). And it is
almost always used to describe something you don't want to do
("Oh, I don't want to buy a pig in a poke"), because it carries the
implication that what you are carrying away after the purchase is
probably not what you thought you were buying. That's how I'm
using the phrase here.)
In the 20+ years since the US Department of Defense published
Nowlan & Heap's report, that TLA "RCM" has been attached to a
heck of a lot of pokes, with a heck of a lot of different kinds of pigs
Peter's e-mail, below, shows one kind of non-Nowlan-and-Heap pig
that gets stuffed into the RCM poke. He asserts that "basic" RCM
consists of a single sensible engineer with access to the plant's
list of its physical assets and to its accounts department, an
engineer who has tools such as the following:
- FMECA (Failure Mode Effects and Criticality Analysis),
- FTA (Fault Tree Analysis),
- Pareto Analysis,
- Block Diagrams,
- Weibull Analysis, and
- an understanding of Risk / Cost Management.
Of these tools, at least four are not mentioned at all by Nowlan and
Heap: FTA, Pareto Analysis, Block Diagrams, and Weibull
(N&H have a decision logic tree, but it is a different logic tree from
the one customarily used in Fault Tree Analysis. Being aviation
people, they are focused on airplanes, and they assume that the
entire airplane will be reviewed -- therefore they do not use Pareto
Analysis to decide which assets do not deserve a review. Their
process does not require Block Diagrams, though it might use
such diagrams if already available. And the failure curves they
developed -- the famous six curves -- were not produced by Weibull
Analysis, but by a different process that neither uses nor generates
mathematical equations (such as the Weibull function). (Appendix
C of their report, "Actuarial Analysis", describes their analytic
Of the remaining tools, the process used by N&H to examine
failure modes is not the same as the FMECA process that is
described in the various FMECA standards available from the US
military, the SAE, and other sources. For one thing, N&H use the
crucial term "failure mode" to refer to something that FMECA does
*not* call a "failure mode". (We in the SAE RCM subcommittee
found this out when we attempted to establish liaison/contact with
the SAE subcommittee that is struggling to write a new FMECA
And "an understanding of risk/cost management" is far looser, far
more vague, than N&H's very specific process for addressing risks
with respect to safety and economics. Does Peter mean N&H's
approach to "risk/cost management"? Or does he mean someone
else's approach? Or one he came up with on his own?
So. Peter's process may be very useful. It probably borrows
valuable concepts and features from Nowlan and Heap's report. I'm
sure he feels he has gotten very good results with it. I have no
reason to dispute his results.
But the process he described in his e-mail is not the process that
Nowlan and Heap lay out in their report.
So someone who *wants* Nowlan and Heap's process, and who
hires Peter because Peter says he uses "RCM" (the name that
Nowlan and Heap made famous), is likely to get something
different from what he (the user) wants.
I don't pretend to know what's happening in Australia. I do know
that, very quickly after the US Department of Defense decided to
abandon Military Standards in the mid-1990s, the US Navy started
getting pokes bearing the label, "RCM", with Joe Blow's pet
process inside -- processes that had had only the most glancing
contact (if any) with Nowlan & Heap's process.
Having had this experience, the US Navy got into the SAE's RCM
standard project very quickly indeed because, as a user, the US
Navy wanted to be sure -- to be *sure* -- that when it asked for
RCM, it could predict what the Sam-Hill it was going to get.
Which is how *I* got into the project (on behalf of our US Navy
client). To help make a standard that would help *users* know
what they were getting when they asked for "RCM".
I don't know all of the organizations that have formally used the
SAE standard so far, of course. I do know that the US Bureau of
Land Management, in the US Department of the Interior, recently
used the SAE standard in an RFP for consulting services. They
seem to have been quite pleased to have access to this tool during
their procurement process.
2. What questions does the SAE standard *not* answer?
When I speak to people about the SAE standard -- and as the
chairman I have spoken to a lot of people about it in the last couple
of years -- I point out that there are two entirely separate questions
that one must answer when setting out to select a process (and
sometimes, by implication, to select a consultant).
The first question is: "is this process RCM?" This is the question
that SAE JA1011 is intended to answer.
If the answer is "yes", the second question is: "is this (RCM)
process cost-effective?" SAE JA1011 does nothing whatsoever to
help answer this question.
And it is a serious question. You see, it is possible to use any
process either well or poorly. As I put it in my presentations, you
can do RCM smart -- or you can do RCM "stoopid". :-)
It's still RCM. But one way is cost-effective, and the other way isn't.
A number of people here have probably heard some consultants
complain about the "vast expense", and "relatively low return", of
SAE-compliant RCM. I worked for one such firm, here in the US. I
have seen presentations by another such American consulting firm.
In both cases, the people in the firms were sincere in their
complaints. They based their complaints on their own experience.
Why did they make these complaints?
I'll tell you why. Because they did RCM "stoopid". Their approach
was wasteful. They organized and trained themselves and their
clients in such a way that it took them forever to get anything done -
- and then, when it got done, it was sometimes a toss-up whether
it would actually be implemented in the plant.
(The most visible defect was in the training. You don't learn to ride
a bike proficiently by reading a book, or by taking a few days of
classroom training. You certainly don't learn to use sophisticated
and complex engineering analytic techniques proficiently that way.
Why should anyone expect someone to be able to learn to do
RCM proficiently that way?)
(The next most-visible defect was in the change-management area
known as "buy-in". My old firm generated a lot of "shelfware" in the
early 1990s, doing one-man analysis "on behalf of" the client
(saved him a lot of work, didn't it?) whose reports were put on a
shelf and forgotten -- because the people who were responsible for
implementing the recommendations saw no reason to make those
And then, having taken forever to get things done (if anything really
*got* done at all), these consultants complained about RCM itself.
And cast about for ways to "streamline" the process itself --
instead of looking for ways to organize and train themselves better.
Personally, I am persuaded that it is possible to use an SAE
JA1011-compliant process in a way that is cost-effective. It is not
easy to come up with such a way (if it were, then everyone would
have one!), but I think that at least one cost-effective SAE JA1011-
compliant process does exist. I think that the US Naval aviation
people on my subcommitee would say that at least two such
But this is not a given. You don't always find that an SAE JA1011-
compliant process is also a cost-effective process.
Those users who are concerned about the cost-effectiveness of the
RCM process they might use would be well-advised to take the
measures they would take when embarking on the use of any other
sort of new process:
Decide what cost-effectiveness metrics are important to you, then
check the track record of that particular process, and see what sort
of experience others have had with it.
Learning to use a cost-effective RCM process at your site is not
simple, and it is not easy. But was it simple or easy to build your
site in the first place? How many important things are simple and
-- Dana Netherton
My two cents worth :-)
From Terrence O'Hanlon
These debates about RCM and other strategies/methods are very interesting.
Peter, Dana and John are obviously very experienced and well versed in what
they do. I appreciate the detailed explanations as they provide a solid
understanding of the foundations for these approaches. I doubt that these
explanations will go very far to convince the "other side" to lay down its
I think that reports on actual results would be as interesting as the RCM
debate if not more so! It would be even better to hear from field
practitioners! Why did LTV file bankruptcy (citing "foreign" competition)
while DoFasco (who must have the same competition) is growing it's profits?
Did their approach to asset management have anything to do with it? What is
the "foreign" competition doing to be so competitive?
How about a New Year resolution to end the RCM debate and start filing this
list with stories about the what is working (and what is not) in the real
world! Does anyone else agree?
RCM, RCM2, PMO, TPM, PdM, CBM and PM (now Plant Services Magazine
www.plantservices.com is promoting EMM - Effective Maintenance Management!)
all look great on paper.
What is working for you?
Happy New Year!
From Thomas Purackal
I would like to learn what are the differences between RCM,RCM-1&RCM-2.Request
the experts like Mr.Peter,Mr.John and others to write.
From John Moubray
As a very first response to your query, please refer to Aladon's website at
www.thealadonnetwork.com. The first two or three of pages of this website give a
brief description of RCM and RCM2.
If you want to study the original text of the document that first described
"Reliability-centered Maintenance", you need to get hold of a copy of the
report entitled "Reliability-centered Maintenance" by F Stanley Nowlan and
Howard Heap from the National Technical Information Service, Springfield,
If you want to study RCM2 in more detail, get hold of a copy of my book
entitled "Reliability-centered Maintenance" (US edition) from
www.amazon.com, or "Reliability-centred Maintenance" (UK edition) from
As the correspondence you have been reading indicates, there are great many
other processes that use the term "RCM" currently on the market. Some can
legitimately be called RCM. Some cannot. The SAE Standard JA1011 was
developed to help users to determine the difference. To obtain a copy of the
standard, visit the SAE website at www.sae.org, and enter the number
"JA1011" into the first search field that you encounter. This will take you
to the page that will enable you to download a copy of the standard for a
sum of US$59. (I have learned that some national standards organisations
charge a lot more for this already expensive document. If you can, I suggest
that you try to order it direct from the SAE.)
From Dana Netherton
Regarding your query, I'll say this:
"RCM", as defined by SAE JA1011, is a 7-step analytic process,
based directly on the process presented in Nowlan & Heap's 1978
US government report.
No specific process bears the name "RCM-1" or "RCM1".
However, RCM processes that are derived directly from Nowlan &
Heap's report (such as the two official US Navy RCM processes,
one for ships and the other for aircraft, both written in the early
1980s), might be called "RCM Mark 1".
One way in which Nowlan & Heap differ from SAE JA1011 is that
N&H address "safety" as an explicit issue to be managed, but do
not address "the environment" as an explicit issue. In the late
1970s, who did?
But today, who does not? An SAE standard that did *not* address
the environment as an explicit issue would do its users a grave
disservice. So SAE JA1011 requires an RCM process to address
the environment as an explicit issue to be managed.
"RCM-2" or "RCM2" is Aladon's RCM process, first defined in the
first edition of John's book in 1991 (about 13 years after the
publication of N&H's report). It does address the environment, and
has a number of other enhancements that motivated John to call it
"RCM Mark 2", or "RCM2". (The final chapter of his book has a
summary of these enhancements.) RCM2 does comply fully with
RCM2 also has some important features in areas outside the
analytic process itself, areas not addressed by SAE JA1011. You
may recall that I mentioned, in my earlier post, that a number of
consultants tried to apply RCM, and failed to get results that were
worth the cost of the effort. You may recall that I said that the
roots of the problem generally lay in the way they organized and
trained people (client people and their own people). And that I said
that these consultants failed to address those issues, but instead
tinkered with the analytic process itself.
RCM2 adds specific ways of organizing and training people so that
they are most likely to apply the RCM process successfully. I'm
not going to go into those things here -- see John's book for an
introductory discussion of them; if you find that you want details,
I'm sure that John or I can refer you to people who can give them in
a better forum than via e-mail. But in my view these points are the
most important features that differentiate RCM2 from the other SAE-
compliant processes out there.
Back to Jim, and perhaps others,
In case any independent comment is needed about John's book, I'll
make it, and gladly. I have read all of the books titled "Reliability-
Centered Maintenance", and IMNSHO John's book is the best,
hands down. Nowlan & Heap's report is just that, a report. It's
groundbreaking and gives mind-bogglingly important background
information about the development of RCM, but it doesn't try to
*teach* RCM. John's book is a textbook. And a much better one
than the other books rattling around the bookstore shelves,
As to your suggestion, well thanks! I appreciate the compliment!
Frankly, though, I don't feel a need to write a book of my own, given
that John's book is already out there.
Again, in the interests of fair play, I will remind folks here that my
firm is a member of the Aladon Network. This means that I have
signed a license agreement with John's company to use his
training materials, including his book (as a textbook). I did that
specifically so that I *could* get access to them. I chose to go in
that direction because I've seen the training materials out there and
I think that John's (Aladon's) are the best.
Futher, in the interests of fair play, those who have seen the most
recent edition of John's book (2.3), will know that John has
obtained my permission to use portions of my magazine article
(portions that talked about the history of RCM) in this edition of his
book. I received no financial compensation for giving that
permission, just a word of thanks in the acknowledgments. :-)
And, again, I have no commercial interest in Australia, NZ, or
elsewhere outside North America, so whether the Aussies, Kiwis,
or Brits agree with me makes no commercial difference to me. :-)
Oh, and do note that John said, "get hold of a copy", not "buy a
copy". If Kim has a friend or colleague who has a copy, I'm sure
John would be happy to see Kim borrow it (so long as Kim reads
Go to Part 3 of this discussion
Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved.
Revised: Thursday, 08-Oct-2015 11:53:49 AEDT