AN 5948
Reliability of High Power
Bipolar Devices
Application Note
AN5948-2 September 2009 LN26862
Authors: Dinesh Chamund, Colin Rout
INTRODUCTION
We are often asked “What is the MTBF or FIT
rating of this diode or that thyristor?” We
cannot answer this without knowing how the
customer intends to use these devices in a
system and what the conditions of the
operation of the devices are. In other words
we would need to know the “Mission Profile”.
MTBF is the “Mean
Time Between Failure”
and is the measure of an average time for a
second component to fail after the failure of a
first component in a system. MTBF usually
applies to a repairable system consisting of
many components. Knowing the MTBF allows
the system designer to recommend repair or
maintenance schedule for the system and
thus deduce the running cost of the system.
For semiconductor devices MTTF (Mean Time
To Fail) is generally appropriate, however
MTBF and MTTF have the same value if the
time to repair a system is negligible. Thus
MTBF is loosely used to mean MTTF for
semiconductor devices.
FIT (Failure
unIT or
Failure In Time)
is a unit
for the measure of failure rate (λ) of the
components, and is equal to one failure per
billion hours (10
9
hours). Both MTBF and λ are
statistical quantities and if the failure pattern
assumes normal distribution then one is the
reciprocal the other (MTBF = 1/λ). The failure
rate is useful for predicting the life of a
device.
The purpose of this Application Note is to
discuss the reliability of the high power
bipolar devices (diodes, thyristors and GTOs)
which is related to the different failure
mechanisms, materials used in packaging the
devices and the manufacturing processes
used. Also different methods used to predict
the reliability and pros and cons about each
method are discussed.
DEVICE CONSTRUCTION
Fig. 1 Pressure contact thyristor construction
Fig. 1 shows a typical construction of a fully
floating pressure contact thyristor and the
materials used. The silicon wafer is
sandwiched between a molybdenum washer
and disc, thus providing electrical contact to
the active parts of the device. These are
further sandwiched between two copper pole
pieces one in the ceramic housing and the
other in the lid. The housing is backfilled with
inert gas and the copper lid is cold welded to
the ceramic housing. In non-fully floating
construction the molybdenum disc is alloyed
to the silicon wafer. Electrical and thermal
Page
1
of
11
AN 5948
contact is made by clamping the pole pieces
under pressure.
THE CONCEPT OF RELIABILTY
Reliability is a design engineering discipline,
which applies scientific knowledge to assure a
product will perform its intended function for
the required duration within a given
environment. This includes designing in the
ability to maintain, test and support the
product throughout its total life cycle.
There are several definitions of the reliability
and IEC 50(919):1990 defines reliability
performance as “The ability of an item to
perform required function under given
conditions for a given interval of time.”
The fundamental understanding of the
reliability of any product requires a basic
understanding of failure mechanisms and how
the failure rate is determined.
FAILURE MECHANISMS
The pressure contact power semiconductor
failures can be classified into two main
categories namely
The random failures
The wear-out failures
Depending on the application and the duty
cycle within that application, any one of these
failure mechanisms can dominate.
Cosmic Ray induced failure
Failure due to cosmic rays was first postulated
in the early 1990s to explain an unexpectedly
high failure rate of GTOs in railway
locomotives running with higher than
previous DC-link voltages. Failures were seen
to be random, sudden, and without any
previous overload condition or signs of wear-
out. The cause of this failure mechanism is
postulated to be neutrons, produced when
Fig. 2 Failure due to cosmic rays causing
damage to the silicon crystal lattice which
gives rise to immediate and catastrophic
failure of the device.
These types of failures are attributed to the
accumulation of incremental physical damage
under the operating load (stress) conditions
altering the device properties beyond the
functional limit. These are mechanical wear
out due to expansion and contraction caused
by cyclic power loading, ionic drift in the
junction passivation leading to an increase in
leakage current and eventual voltage
breakdown.
Random failures:
These failures are caused by external
accidental event such as particle radiation,
voltage transients, and damage by service
actions leading to momentary over-stress.
This type of failure is not related to the length
of service or the age of the device. Figure 2
shows a typical failure site due to cosmic ray
activity.
Wear-out failures:
Page 2 of 11
AN 5948
cosmic rays collide with the upper
atmosphere, which have energies above
10MeV. When one of these neutrons hits the
silicon lattice it will generate electron-hole
pairs. If the electric field is high enough, the
electrons and holes will be accelerated to
sufficient energy to cause avalanche
multiplication and consequent device
breakdown.
Because the failure rate is exponentially
related to the bias voltage and proportional to
the time spent at that voltage, it is only
applications where the device sits at high DC
volts relative to its rating that this failure
mechanism needs to be considered. There is
no easily applied universal formula for the
failure rate because it depends upon the
electric field profiles within the silicon of the
device which depend upon the design
philosophy. Generally, devices intended for
such applications will have a figure for the
maximum DC voltage for a rating of 100 FITS
at 100% duty cycle quoted in their datasheets.
Mechanical Wear
In a power semiconductor, increases and
decreases in the temperature of the device
will cause the various internal components to
expand and contract. Table 1 gives the linear
temperature coefficient of expansion for
materials commonly found inside such
devices. In large diameter, high reliability
devices, where the internal components are
pressed together by a clamp, molybdenum
buffers are used between the silicon wafer
and the copper electrodes, but in some of the
smaller diameter products the copper
electrode is in direct contact with the silicon
wafer. The difference in the coefficient of
thermal expansion causes movement of one
component relative to its neighbours with a
resultant scrubbing action. This scrubbing will
eventually lead to degradation of the device
characteristics; initially an increase in the
forward voltage drop but eventually the
silicon becomes chipped and the voltage
blocking capability of the device is lost.
Table 1: Material properties
Linear coefficient of
expansion @ 20°C x
10
6
per °C
4.2
16.5
23.95
5.2
18.9
Material
Silicon
Copper
Aluminium
Molybdenum
Silver
Fig. 3a Thermo-mechanical wearout
Fig3b Expanded view of the wearout
Fig. 3a and 3b show a typical example of
thermo-mechanical wear out failure. Note
Page
3
of
11
AN 5948
that in Fig. 3a the wearout marks are radial
with respect to the centre of the device. In
the expanded view of the failure site (Fig. 3b)
the scrape marks from the sliding action can
be seen at the bottom of several wear areas.
Ionic Drift in the passivation
The surface of the silicon that supports the
blocking voltage of devices is passivated with
one of a number of different compounds,
depending on the structure, that has a
number of functions. Primarily it is a high
dielectric material used to confine the electric
field but it also locks up mobile ionic charge
that may be present on the surface of the
silicon. If this ionic charge drifts, under the
influence of the applied electric field, into a
region of high field strength it can cause
excess leakage current to be observed and a
resultant degradation in the voltage blocking
performance of the device.
This phenomenon is largely limited to very
high voltage devices. Manufacturers of these
products will subject their devices to a short
“burn-in” to precipitate early life failures due
to this mechanism. After any early failures the
failure rate is extremely small.
PREDICTIVE RELIABILITY
Many engineering disciplines incorporate
reliability engineering that employ tools and
methodology of reliability engineering such as
predictive reliability, Weibull analysis,
reliability testing and accelerated life testing.
The purpose of predictive reliability is to
evaluate the failure rate (λ) or the MTBF of
the device for a specified lifetime. The failure
rate of a large population of similar and non-
repairable items show a typical bathtub curve
(Fig.4) with the following three phases:
1. Early failures:
where λ(t) generally
decreases rapidly with time. The
failures in this phase are attributed to
Page 4 of 11
λ(t)
fai
randomly distributed weaknesses in
materials, components or production
processes. To eliminate early failure,
burn-in or environmental stress
screening is used. This phase is also
called infant mortality.
2. Useful life:
where failure rate is
approximately constant and is useful
for calculations. The failures are
intrinsic and random (mainly related
to failure of silicon material).
3. Wearout failures:
where λ(t) increases
with time. The failures in this phase
are attributed to degradation
phenomenon due to aging, fatigue,
wearout, etc.
Early
Failures
Useful Life
Wearout
failures
t
Fig. 4 The Bathtub Curve
Some of the methods used in semiconductor
industry to predict reliability are:
Field failure experience
Qualification procedure
Theoretical calculation
Physics of failure method
Field failure experience:
This method involves collection and analysis
of all field failures and also system
integration. The advantage of this method is
that it gives the best reliability evaluation. The
AN 5948
main drawback is the difficulty in collecting
data, and its integrity (use duration, failure
context, quantity of parts used with reliable
accuracy). This method may not be suitable
for small or medium volume power
semiconductor manufacturers as quantities
involved may not be statistically significant.
Qualification procedure:
The principle behind this method is to qualify
a product based on a test plan according to
defined conditions such as international
standards and or some reference test plan.
The obvious advantages of this method are
the same evaluation process for all companies
in a same industry sector and no additional
cost for study (test plan definition). The major
disadvantage is that the test plan becomes
obsolete when considering new technology.
The test plan can be very general and not
exactly adapted to the application (constraint
choice).
Table 3 shows the standard qualification tests
(based mainly on the IEC Standard) adopted
by Dynex during the product release stage
and the maintenance of the qualified product.
Theoretical calculation:
The traditional method of calculating failure
rate uses an accelerated life testing of the
device. The method involves testing devices
from a random sample obtained from the
parent population followed by a stress test,
under accelerated conditions, to promote
failures. The acceleration factor (AF) thus
obtained is then extrapolated to end-use
conditions by means of a predetermined
statistical model to give an estimate of the
failure rate in the field applications. For
thermally/electrically
activated
failures,
modified Arrhenius equation (1) is used in
conjunction with Chi square statistical model
equation (2).
E
1
1
V
2
AF
½
exp
a
k
T
T
stress
V
1
use
(1)
½
2
2
T
D
AF
10
9
FIT
(2)
where
AF
E
a
k
T
use
T
stress
V
2
V
1
β
= acceleration factor
= activation energy (eV)
= Boltzmann-factor (J/K)
= application temperature (°C)
= stress temperature (°C)
= Test voltage
= Application voltage
= constant for voltage stress
= failure rate (FIT)
= Chi square confidence value
= Total device hours
T
D
From the equation (2), a higher value of
device hours (T
D
) gives low value of failure
rate. Hence in order to accumulate a high
number of device hours, large numbers of
devices in test are required and or much
longer time for the test. This form of statistics
is acquired over a number of years of regular
testing of the product. The unknown
parameters in equation (1) are the activation
energy E
a
and β. E
a
is a constant in the
Arrhenius equation and is related to the
kinetics of the underlying physical process
under temperature stress while β is a
constant related to the voltage stress. These
constants are experimentally determined.
For cyclic stress the Coffin Manson equation
(3) is used. This model predicts the number of
Page
5
of
11