Probabilistic Evaluation of the Effect of Maintenance
Parameters on Reliability and Cost
Mohsen Ghavami Mladen Kezunovic
Electrical and Computer Engineering Department Electrical and Computer Engineering Department
Texas A&M University Texas A&M University
College Station, TX 77843-3128, USA College Station, TX 77843-3128, USA
*********@***.****.*** *******@***.****.***
In the literature, it is also analyzed whether these models are
Abstract Preventive maintenance is performed to extend the
realistic or not, especially when there is a non-periodic
equipment lifetime or at least the mean time between failures.
Cost-effective maintenance scheduling is important due to budget inspection [3]. In this paper, the inspection/maintenance
constraints in the current situation where reduction of the strategy is evaluated by a proposed model. In most of
operating and capital cost is the focus of the power industry. In maintenance strategies, the inspection is non-periodic and
order to establish a cost-effective maintenance, quantitative increased at the end of life cycle of the component. Also, the
evaluation of maintenance parameters is critical. In this paper, a inspection intervals are deterministic, and the duration of the
probabilistic model to achieve cost-effective maintenance inspection is a constant number. The model proposed in this
strategies is presented. Reliability indices such as mean duration,
paper follows this kind of maintenance strategy.
state probability and visit frequency of each state, are computed
using Monte Carlo simulation and demonstrated using a This paper is focused on the way the life cycle of the
numerical example. Further, cost analysis is performed by component is implemented. Although representing the life time
computing all associated costs including inspection, maintenance of the component in several discrete stages according to the
and failure costs based on the reliability indices. deterioration levels is well known concept, this paper is
different from the rest in the way the life cycle of the
Keywords; State diagrams, Deterioration, Maintenance, component is simulated in a selected algorithm. Specifically
Inspection, Monte Carlo simulation
the way the transition between the life time stages and
inspection stages has been handled is explored. This model
I. INTRODUCTION would be suitable particularly in the case of non-periodic
inspection strategies. The transition time distribution between
The utilities perform regular inspection, planned
deterioration stages are assumed to be exponential, where as
maintenance at a selected working state of components and on-
the transition from deterioration stage to inspection stage is
demand repair or replacement at the failure state of component.
assumed to be a constant number.
They have always utilized maintenance programs to keep their
equipment in desirable working condition for as long as it is The paper is organized as follows. Section II discusses
feasible [1]. Probabilistic maintenance models and reliability maintenance models using state diagram and transition rates
centered maintenance have been presented to optimize between the states. In section III, a model is proposed to
maintenance and reliability costs [2]-[10]. A risk based simulate the life cycle of the component. Cost analysis is
approach is proposed for maintenance scheduling of circuit discussed in section IV. In section V, a numerical example is
breaker in [5]. This approach is different from the other risk presented and solved using Monte Carlo simulation to extract
based approaches in the way the risk is calculated. It utilizes the reliability indices of the probabilistic maintenance model,
the maintenance quantification models developed earlier to followed by conclusions in section VI.
quantify the circuit breaker maintenance [12]-[13]. These
approaches are working pretty well when there is a continuous
II. MAINTENANCE MODELING USING STATE DIAGRAMS
monitoring or the inspection rate is so high, which results in a
lot of available data about the condition of the component. It is a matter of common knowledge that component
failures are divided into two categories: either random failures
In the literature, a state diagram is used to represent the
or those arising as a consequence of deterioration. Note that
deterioration process of the component [1]-[2]. It is assumed
these are state models, not Markov model as there are no
that the remaining time in each state is a random variable
assumptions made about the time distributions of the individual
exponentially distributed [1]-[2], [4], [13]-[14]. With this
transitions [1]. The process of deterioration can be thought of a
assumption, the state diagram can be represented by a Markov
sequence of deterioration stages shown in Fig. 1. In most
process and there are some analytical solutions for this model.
applications, considering three deterioration stages such as
initial stage (D1), a minor (D2) and a major (D3) deterioration
Mohsen Ghavami and Mladen Kezunovic are with the Department of
Electrical and Computer Engineering, Texas A&M University, College stage, is sufficient [2]. If no maintenance is performed, a new
Station, TX 77843-3128, USA(emails: *********@***.****.***,
component will run through all the stages, respectively. It is
*******@***.****.***).
978-1-4244-5721-2/10/$26.00 2010 IEEE PMAPS 2010
Figure 1. State diagram for modeling the life cycle of the component
(without maintenance)
Figure 3. State diagram for modeling the life cycle of the component with
inspection/maintenance strategy, the transition duration between states is
supposed to be exponentially distributed, so the transition rate is a
constant number.
The most important assumption in the model, shown in Fig.
3, is that the maintenance actions are not carried out in a
Figure 2. State diagram with adding the maintenance state
predefined schedule. Based on regular inspections, it can be
decided if and what kind of maintenance should be done. The
reasonable that these stages can be defined by specific signs decision after inspection can be either doing nothing (where the
that appeared in the component because of aging and realized condition of the equipment is in deterioration stage D1) or
by inspections. It is a good assumption (near reality) that the carrying out specific kind of maintenance denoted by M2 or
failure probability of the component is arisen by these M3. As seen in Fig. 3, the inspection rates 1- 3 can be equal
consequences of deterioration of stages, and the remaining time which means this approach holds for either periodic or non-
of the component in each stage is independent of the time for periodic inspection. It is obvious that the probability of
which the component has been in that stage. detecting a critical situation at the end of the component s life
cycle is increased and returning the component to the previous
The negative exponential probability distribution is the only
situation needs more effort. So, it is reasonable that the
one that has the memory less property [15], and it is used to
inspection rate is higher if the equipment is deteriorated more.
represent the probability of such event. In the real world, most
There is a good discussion and model about non-periodic
of utilities conduct maintenance actions based on periodic
inspection in [3].
inspection or maybe non-periodic inspection. It means that the
state of the system is completely unknown unless inspection is
III. EXRACTING RELIABILITY INDICES USING MONTE
performed [16].
CARLO SIMULATION
Although more discussions are needed about inspection
The goal of this section is to devise a model which follows
models, the previous model shown in Fig. 2 can be improved to
the maintenance strategy in the real world and afterward
the model shown in Fig. 3 which includes inspection states. It
solving the model using Monte Carlo simulation. To have
is shown that there is an inspection state instead of dotted-line
compatible results with the maintenance strategy in the real
for maintenance in state D1. In this model, based on inspection
world, two assumptions are considered in this model. First,
results, two kinds of maintenance, M2 (minor maintenance) or
inspection rates should be increased along with aging of the
M3 (major maintenance), can be performed, or the component
component. The most of maintenance strategies utilized by
will be left without any kind of maintenance if it is in state D1.
utilities have non-periodic inspection rates increasing at the end
The expected result of all maintenance actions is only
of life cycle of the component. Second, the remaining time in
improvement to the previous stage [6], [17]-[18]. In some
the inspection state and also inspection duration are
literature, waiting periods after inspections are considered.
deterministic and they cannot be modeled by exponential
Also some contingencies where no improvement is achieved,
distributions. Thus, the model is not still Markov process and
or even some damage is done by maintenance activities are
the answer is hard to derive through analytical solutions.
reported [2]. For the sake of simplicity, these cases are not
Therefore, the best way to solve the proposed model in this
considered in this paper. If one assumes that remaining life
section is Monte Carlo simulation. In general, inspections are
time in each state has an exponential probability distribution
performed which leads to three kinds of decisions followed:
and the transition rate between states are constant numbers, the
state diagram will turn into Markov process. There are some do nothing, if the component is still in initial stage D1;
analytical methods to solve this probabilistic model and extract
reliability indices such as mean durations, visit frequencies and Carry out minor maintenance M2, if the component is
mean time between failures [15], [19]. Monte Carlo simulation in stage D2. This will return the device to stage D1;
can be used to solve this probabilistic model when the answer
Carry out major maintenance M3, if the component is
is difficult to drive through analytical solutions. Maintenance
in state D3. This will improve the component
models with this structure are discussed in the literature [2],
condition to stage D2;
[4], [13]-[14], [20].
In order to establish a maintenance strategy including these
assumptions, the model shown in Fig. 3 should be improved to
the model shown in Fig. 4. The main idea of this model is that
the deterioration process and inspection strategy are two
parallel processes. It means the deterioration process does not
change the next inspection time determined by last inspection.
It is more realistic because the state of component is
completely unknown if the inspection is not performed [16]. In
the model seen in Fig. 3, the inspection rate will change from
1 to 2 if the component is deteriorated single step. In the real
world, there is a non-periodic inspection which means 1 is not
equal to 2, and the next inspection time is determined at the
last inspection.
The main goal of this section is to develop an algorithm to
simulate the life cycle of a component in a probabilistic model
and find out the reliability indices by using the Monte Carlo
simulation. To understand the model and the proposed
algorithm to solve that, some iteration is followed. The
parameters in this model are:
inspection intervals [years];
inspection duration [years];
repair rate [1/years];
deterioration rate [1/years];
Suppose the component is at state Dx D1 at time t0=0 and it
will transit either to state Dx D2, or inspection state I (there is
the same story for the other situations; it means in each step, it
transits either to the next deterioration stages or to the
inspection state). The simulation of transition from state D1 to
state D2 can be modeled with a random number generated by
an exponential distributed random number generator with rate
1, denoted by d12. For transition from State Dx D1 to
inspection state I, it is a constant number equal to the
inspection interval 1 for the first inspection. Similarly in Fig.
3, 1 is the inspection rate and 1/ 1 is equal to inspection
period, but here the inspection intervals are not exponentially
distributed random variables. Thus, the leaving time of the state
Dx D1, is either (t0 + d12) or (t0 + 1). As seen in the Fig. 4, the
inspection interval denoted by x, which is similar to the
inverse of the transition rate to inspection states in Fig. 3, is
varying (it can be equal to 1, 2 or 3) and it is determined at
the last inspection period.
Assume that (t0+d12) > (t0+ 1), so first there is a transition
from state Dx D1 to inspection state I. This inspection will
show us that the component is still in state Dx D1 and Figure 4. A model to simulate the life cycle of the component with
according to the maintenance strategy, nothing will be inspection/maintenance strategy. This model is solved by Monte Carlo
performed and the system will return to the state Dx D1. simulation.
Approximately, the inspection duration is neglected in
comparison to inspection intervals but it can be considered,
from state Dx D1 to state Dx D2 will be (t0+d12). This is the
which is equal to 1 (inspection duration). Thus, the component
main difference between this model and conventional Markov
is returned to state Dx D1 at time (t0+ 1+ 1). The next transition
process based model shown in Fig. 3. In that model, after
can be either to state Dx D2 or inspection state as before. The
returning from inspection state I1, a new number will be
most important point in this algorithm is that another random
regenerated for the deterioration time from State D1 to state D2
number for transition time from state Dx D1 to state Dx D2 has
which is not compatible with the situation in the reality.
not been generated because the time of deterioration does not
change when inspection is performed (inspection does not Therefore, the next leaving time of state Dx D1 is either
make any improvement). Therefore, the next transition time (t0+d12) or (t0+ 1+ 1+ 1).
Suppose that (t0+d12)
deterioration time from state Dx D1 to state Dx D2 is less than and failure. These notations are followed:
the inspection time. At time (t0+d12), the component will
CT=total expected annual cost
deteriorate to state Dx D2; afterward there are two possible
ways to leave the state Dx D2. Obviously, it can transit either CF=the cost of repair or replacement paid after failure
to state Dx D3, or to inspection state. The time of transition
CMx=the cost of maintenance action type x (M2 or M3)
from state Dx D2 to state Dx D3 can be obtained by a random
number generated by an exponentially distributed random CI=the cost of inspection
number generator with rate 2, denoted by d23. As the
CT=CF (frequency of failure state) + CM2 (frequency of state
deterioration transition from state Dx D1 to state Dx D2 is not
M2) + CM3 (frequency of state M3) + CI(frequency of state I)
predetermined (it is only a probabilistic model), the next
inspection time will not change, which means the next The cost paid after the failure of the component may not
inspection time is still (t0+ 1) as before. As it is mentioned include only the repair or replacement cost, but also the cost of
already, this is the main difference between this algorithm for event consequences and damages to the entire system should be
Monte Carlo simulation and the other algorithms based on involved if the supply is interrupted by that failure. There is the
Markov process model. In the Markov process models, a new same scenario for the maintenance and inspection. Maybe, the
random number will be generated for the next inspection time cost is needed to take the component out of service for
in this case. Therefore, there are two possible transition time, maintenance or inspection. At these situations, the mean
(t0+d12+d23) and (t0+ 1). duration time of component being at each state is critical and
the time can contribute to the cost of that state. In some
If (t0+d12+d23) > (t0+ 1), it will transit to inspection state
maintenance strategies, there is some waiting period before it is
where it will be revealed that the component condition is in
suitable time for doing maintenance or inspection to reduce the
state Dx D2, and the specific kind of maintenance M2 is
costs. If the duration of these periods are not comparable to
required. Maintenance is done immediately after inspection and
inspection intervals, it can be neglected, and that is why it is
the component s condition will return to the state Dx D1. The
not mentioned in the model shown in the previous section.
maintenance action duration can be neglected in comparison
with the inspection intervals, but it is an exponential random Another element in the cost estimation is the visit
number with rate 2, denoted by m2. At time (t0+ 1+ 2+m2), the frequency, which is calculated based on the maintenance
condition of the component is repaired as new and returns to strategy. In Markov process, the visit frequency of state j is the
state Dx D1. To continue the modeling of the life cycle of the frequency of encountering state j from the other states. In the
component, the new numbers for transition times have to be model in Fig. 3, all the times, when there is a transition from
derived using random number generators as before. Thus, there state D1 to state I, are counted in the visit frequency of state D1.
will be two numbers; (t0+ 1+ 2+m2+d12new) for transition to Although there is a transition to inspection state, the component
state Dx D2, and (t0+ 1+ 2+m2+ 1) for transition to inspection is still in deteriorating stage D1. So, it should not be considered
state. in the visit frequency and mean duration of state D1 [3]. In the
model shown in Fig. 4, there is the same scenario to figure out
Returning to the first assumption in the last paragraph, if
the reliability indices using Monte Carlo simulation. Therefore,
(t0+d12+d23) (t0+ 1), it leads to
detection of the component condition which is in state Dx D3.
After inspection, it will be decided to perform the major V. NUMERICAL EXAMPLE
maintenance denoted by M3. This kind of maintenance will This section presents, a numerical example based on the
return the component condition single step back. Finally at time ideas in the past two sections. The input data is referred to in
(t0+ 1+ 3+m3), it returns to state Dx D2. Be careful that for the [2]. The data is obtained from the analysis of a number of 230
next iteration, the inspection interval would be 2. It could be KV air-blast circuit breakers with a total operating history of
(t0+d12+d23+d3F)