Failure mode, effects, and criticality analysis

Failure mode, effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA). FMEA is a bottom-up, inductive analytical method which may be performed at either the functional or piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value. FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications, while various forms of FMEA predominate in other industries.

History

FMECA was originally developed in the 1940s by the U.S military, which published MIL–P–1629 in 1949.^[1] By the early 1960s, contractors for the U.S. National Aeronautics and Space Administration (NASA) were using variations of FMECA under a variety of names.^[2]^[3] In 1966 NASA released its FMECA procedure for use on the Apollo program.^[4] FMECA was subsequently used on other NASA programs including Viking, Voyager, Magellan, and Galileo.^[5] Possibly because MIL–P–1629 was replaced by MIL–STD–1629 (SHIPS) in 1974, development of FMECA is sometimes incorrectly attributed to NASA.^[6] At the same time as the space program developments, use of FMEA and FMECA was already spreading to civil aviation. In 1967 the Society for Automotive Engineers released the first civil publication to address FMECA.^[7] The civil aviation industry now tends to use a combination of FMEA and Fault Tree Analysis in accordance with SAE ARP4761 instead of FMECA, though some helicopter manufacturers continue to use FMECA for civil rotorcraft.

Ford Motor Company began using FMEA in the 1970s after problems experienced with its Pinto model, and by the 1980s FMEA was gaining broad use in the automotive industry. In Europe, the International Electrotechnical Commission published IEC 812 (now IEC 60812) in 1985, addressing both FMEA and FMECA for general use.^[8] The British Standards Institute published BS 5760–5 in 1991 for the same purpose.^[9]

In 1980, MIL–STD–1629A replaced both MIL–STD–1629 and the 1977 aeronautical FMECA standard MIL–STD–2070.^[10] MIL–STD–1629A was canceled without replacement in 1998, but nonetheless remains in wide use for military and space applications today.^[11]

Methodology

Slight differences are found between the various FMECA standards. By RAC CRTA–FMECA, the FMECA analysis procedure typically consists of the following logical steps:

Define the system
Define ground rules and assumptions in order to help drive the design
Construct system block diagrams
Identify failure modes (piece part level or functional)
Analyze failure effects/causes
Feed results back into design process
Classify the failure effects by severity
Perform criticality calculations
Rank failure mode criticality
Determine critical items
Feed results back into design process
Identify the means of failure detection, isolation and compensation
Perform maintainability analysis
Document the analysis, summarize uncorrectable design areas, identify special controls necessary to reduce failure risk
Make recommendations
Follow up on corrective action implementation/effectiveness

FMECA may be performed at the functional or piece part level. Functional FMECA considers the effects of failure at the functional block level, such as a power supply or an amplifier. Piece part FMECA considers the effects of individual component failures, such as resistors, transistors, microcircuits, or valves. A piece part FMECA requires far more effort, but is sometimes preferred because it relies more on quantitative data and less an engineering judgment than a functional FMECA.

The criticality analysis may be quantitative or qualitative, depending on the availability of supporting part failure data.

System definition

In this step, the major system to be analyzed is defined and partitioned into an indentured hierarchy such as systems, subsystems or equipment, units or subassemblies, and piece parts. Functional descriptions are created for the systems and allocated to the subsystems, covering all operational modes and mission phases.

Ground rules and assumptions

Before detailed analysis takes place, ground rules and assumptions are usually defined and agreed to. This might include, for example:

Standardized mission profile with specific fixed duration mission phases
Sources for failure rate and failure mode data
Fault detection coverage that system built-in test will realize
Whether the analysis will be functional or piece part
Criteria to be considered (mission abort, safety, maintenance, etc.)
System for uniquely identifying parts or functions
Severity category definitions

Block diagrams

Next, the systems and subsystems are depicted in functional block diagrams. Reliability block diagrams or fault trees are usually constructed at the same time. These diagrams are used to trace information flow at different levels of system hierarchy, identify critical paths and interfaces, and identify the higher level effects of lower level failures.

Failure mode identification

For each piece part or each function covered by the analysis, a complete list of failure modes is developed. For functional FMECA, typical failure modes include:

Untimely operation
Failure to operate when required
Loss of output
Intermittent output
Erroneous output (given the current condition)
Invalid output (for any condition)

For piece part FMECA, failure mode data may be obtained from databases such as RAC FMD–91^[12] or RAC FMD–97.^[13] These databases provide not only the failure modes, but also the failure mode ratios. For example:

**Device Failure Modes and Failure Mode Ratios (FMD–91)**
Device Type	Failure Mode	Ratio (α)
Relay	Fails to trip	.55
	Spurious trip	.26
	Short	.19
Resistor, Composition	Parameter change	.66
	Open	.31
	Short	.93

Each function or piece part is then listed in matrix form with one row for each failure mode. Because FMECA usually involves very large data sets, a unique identifier must be assigned to each item (function or piece part), and to each failure mode of each item.

Failure effects analysis

Failure effects are determined and entered for each row of the FMECA matrix, considering the criteria identified in the ground rules. Effects are separately described for the local, next higher, and end (system) levels. System level effects may include:

System failure
Degraded operation
System status failure
No immediate effect

The failure effect categories used at various hierarchical levels are tailored by the analyist using engineering judgment.

Severity classification

Severity classification is assigned for each failure mode of each unique item and entered on the FMECA matrix, based upon system level consequences. A small set of classifications, usually having 3 to 10 severity levels, is used. For example, When prepared using MIL–STD–1629A, failure or mishap severity classification normally follows MIL–STD–882.^[14]

**Mishap Severity Categories (MIL–STD–882)**
Category	Description	Criteria
I	Catastrophic	Could result in death, permanent total disability, loss exceeding $1M, or irreversible severe environmental damage that violates law or regulation.
II	Critical	Could result in permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, loss exceeding $200K but less than $1M, or reversible environmental damage causing a violation of law or regulation.
III	Marginal	Could result in injury or occupational illness resulting in one or more lost work days(s), lossexceeding $10K but less than $200K, or mitigatible environmental damage without violation of law or regulation where restoration activities can be accomplished.
IV	Negligible	Could result in injury or illness not resulting in a lost work day, loss exceeding $2K but less than $10K, or minimal environmental damage not violating law or regulation.

Current FMECA severity categories for U.S. Federal Aviation Administration (FAA), NASA and European Space Agency space applications are derived from MIL–STD–882.^[15]^[16]^[17]

Failure detection methods

For each component and failure mode, the ability of the system to detect and report the failure in question is analyzed. One of the following will be entered on each row of the FMECA matrix:

Normal: the system correctly indicates a safe condition to the crew
Abnormal: the system correctly indicates a malfunction requiring crew action
Incorrect: the system erroneously indicates a safe condition in the event of malfunction, or alerts the crew to a malfunction that does not exist (false alarm)

Criticality ranking

Failure mode criticality assessment may be qualitative or quantitative. For qualitative assessment, a mishap probability code or number is assigned and entered on the matrix. For example, MIL–STD–882 uses five probability levels:

**Failure Probability Levels (MIL–STD–882)**
Description	Level	Individual Item	Fleet
Frequent	A	Likely to occur in the life of the item	Continuously experienced
Probable	B	Will occur several times in the life of an item	Will occur frequently
Occasional	C	Likely to occur some time in the life of an item	Will occur several times
Remote	D	Unlikely but possible to occur in the life of an item	Unlikely, but can reasonably be expected to occur
Improbable	E	So unlikely, it can be assumed occurrence may not be experienced	Unlikely to occur, but possible

The failure mode may then be charted on a criticality matrix using severity code as one axis and probability level code as the other. For quantitative assessment, modal criticality number $C_m$ is calculated for each failure mode of each item, and item crticality number $C_r$ is calculated for each item. The criticality numbers are computed using the following values:

Basic failure rate $\lambda_p$
Failure mode ratio $\alpha$
Conditional probability $\beta$
Mission phase duration $t$

The criticality numbers are computed as $C_m = \lambda_p \alpha \beta t$ and $C_r = \sum_{n=1}^N (C_m)_n$ . The basic failure rate $\lambda_p$ is usually fed into the FMECA from a failure rate prediction based on MIL–HDBK–217, PRISM, RIAC 217Plus, or a similar model. The failure mode ratio may be taken from a database source such as RAC FMD–97. For functional level FMECA, engineering judgment may be required to assign failure mode ratio. The conditional probability number $\beta$ represents the conditional probability that the failure effect will result in the identified severity classification, given that the failure mode occurs. It represents the analyst's best judgment as to the likelihood that the loss will occur. For graphical analysis, a criticality matrix may be charted using either $C_m$ or $C_r$ on one axis and severity code on the other.

Critical item/failure mode list

Once the criticality assessment is completed for each failure mode of each item, the FMECA matrix may be sorted by severity and qualitative probability level or quantiative criticality number. This enables the analysis to identify critical items and critical failure modes for which design mitigation is desired.

Recommendations

After performing FMECA, recommendations are made to design to reduce the consequences of critical failures. This may include selecting components with higher reliability, reducing the stress level at which a critical item operates, or adding redundancy or monitoring to the system.

Maintainability analysis

FMECA usually feeds into both Maintainability Analysis and Logistics Support Analysis, which both require data from the FMECA.

FMECA report

A FMECA report consists of system description, ground rules and assumptions, conclusions and recommendations, corrective actions to be tracked, and the attached FMECA matrix which may be in spreadsheet, worksheet, or database form.

Risk priority calculation

RAC CRTA–FMECA and MIL–HDBK–338 both identify Risk Priorty Number (RPN) calculation is an alternate method to criticality analysis. The $RPN$ is a result of a multiplication of detectability $(D)$ x severity $(S)$ x occurrence $(O)$ . Each on a scale from 1 to 10. The highest $RPN$ is $10$ x $10$ x $10$ = $1000$ . This means that this failure is not detectable by inspection, very severe and the occurrence is almost sure. If the occurrence is very sparse, this would be $1$ and the $RPN$ would decrease to $100$ . So, criticality analysis enables to focus on the highest risks.

Advantages and disadvantages

Strengths of FMECA include its comprehensiveness, the systematic establishment of relationships between failure causes and effects, and its ability to point out individual failure modes for corrective action in design. Weaknesses include the extensive labor required, the large number of trivial cases considered, and inability to deal with multiple-failure scenarios or unplanned cross-system effects such as sneak circuits.

According to an FAA research report for commercial space transportation,

Failure Modes, effects, and Criticality Analysis is an excellent hazard analysis and risk assessment tool, but it suffers from other limitations. This alternative does not consider combined failures or typically include software and human interaction considerations. It also usually provides an optimistic estimate of reliability. Therefore, FMECA should be used in conjunction with other analytical tools when developing reliability estimates.^[18]

References

^ Procedures for Performing a Failure Mode, Effects and Criticality Analysis. U.S. Department of Defense. 1949. MIL–P–1629.
^ Neal, R.A. (1962) (pdf). Modes of Failure Analysis Summary for the Nerva B-2 Reactor. Westinghouse Electric Corporation Astronuclear Laboratory. WANL–TNR–042. http://hdl.handle.net/2060/19760069385. Retrieved 2010-03-13.
^ Dill, Robert; et. al. (1963) (pdf). State of the Art Reliability Estimate of Saturn V Propulsion Systems. General Electric Company. RM 63TMP–22. http://hdl.handle.net/2060/19930075105. Retrieved 2010-03-13.
^ (pdf) Procedure for Failure Mode, Effects and Criticality Analysis (FMECA). National Aeronautics and Space Administration. 1966. RA–006–013–1A. http://hdl.handle.net/2060/19700076494. Retrieved 2010-03-13.
^ (pdf) Failure Modes, Effects, and Criticality Analysis (FMECA). National Aeronautics and Space Administration JPL. PD–AD–1307. http://www.klabs.org/DEI/References/design_guidelines/analysis_series/1307.pdf. Retrieved 2010-03-13.
^ Borgovini, Robert; Pemberton, S., Rossi, M. (1993) (pdf). Failure Mode, Effects and Criticality Analysis (FMECA). B. Reliability Analysis Center. p. 5. CRTA–FMECA. http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA278508. Retrieved 2010-03-03.
^ Design Analysis Procedure For Failure Modes, Effects and Criticality Analysis (FMECA). Society for Automotive Engineers. 1967. ARP926.
^ 56 (1985) (pdf). International Electrotechnical Commission. IEC 812. http://webstore.iec.ch/p-preview/info_iec60812%7Bed1.0%7Den_d.img.pdf. Retrieved 2010-03-03.
^ Reliability of Systems, Equipment and Components Part 5: Guide to Failure Modes, Effects and Criticality Analysis (FMEA and FMECA). British Standards Institute. 1991. BS 5760–5.
^ (pdf) Procedures for Performing a Failure Mode, Effects and Criticaility Analysis. A. U.S. Department of Defense. 1980. MIL–HDBK–1629A. https://assist.daps.dla.mil/quicksearch/basic_profile.cfm?ident_number=37027. Retrieved 2010-03-14.
^ "7.8 Failure Mode and Effects Analysis (FMEA)" (pdf). Electronic Reliability Design Handbook. B. U.S. Department of Defense. 1998. MIL–HDBK–338B. https://assist.daps.dla.mil/quicksearch/basic_profile.cfm?ident_number=54022. Retrieved 2010-03-13.
^ Chandler, Gregory; Denson, W., Rossi, M., Wanner, R. (1991) (pdf). Failure Mode/Mechanism Distributions. Reliability Analysis Center. FMD–91. http://handle.dtic.mil/100.2/ADA259655. Retrieved 2010-03-14.
^ Failure Mode/Mechanism Distributions. Reliability Analysis Center. 1997. FMD–97. http://infostore.saiglobal.com/store/Details.aspx?ProductID=554377.
^ (pdf) Standard Practice for System Safety. D. U.S. Department of Defense. 1998. MIL–HDBK–882D. https://assist.daps.dla.mil/quicksearch/basic_profile.cfm?ident_number=36027. Retrieved 2010-03-14.
^ NASA Systems Engineering Handbook. National Aeronautics and Space Administration. SP–610S. http://snebulos.mit.edu/projects/reference/NASA-Generic/NASA-SP-610S.
^ Failure Modes, Effects and Criticality Analysis (FMECA). D. European Space Agency. 1991. ECSS–Q–30–02A.
^ (pdf) Reusable Launch and Reentry Vehicle System Safety Processes. Federal Aviation Administration. 2005. AC 431.35–2A. http://rgl.faa.gov/Regulatory_and_Guidance_Library/rgAdvisoryCircular.nsf/0/e62865e0d3765a1986257052006fc948/$FILE/AC%20431.35-2A.pdf. Retrieved 2010-03-14.
^ (pdf) Research and Development Accomplishments FY 2004. Federal Aviation Administration. 2004. http://www.faa.gov/about/office_org/headquarters_offices/ast/about/media/032504.pdf. Retrieved 2010-03-14.

External links

FMEA and FMECA Information